CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2013 vol.35 Issue No.08 - Aug.

Subscribe

Issue No.08 - Aug. (2013 vol.35)

pp: 1829-1846

R. Memisevic , Dept. of Comput. Sci. & Oper. Res., Univ. of Montreal, Montreal, QC, Canada

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.53

ABSTRACT

A fundamental operation in many vision tasks, including motion understanding, stereopsis, visual odometry, or invariant recognition, is establishing correspondences between images or between images and data from other modalities. Recently, there has been increasing interest in learning to infer correspondences from data using relational, spatiotemporal, and bilinear variants of deep learning methods. These methods use multiplicative interactions between pixels or between features to represent correlation patterns across multiple images. In this paper, we review the recent work on relational feature learning, and we provide an analysis of the role that multiplicative interactions play in learning to encode relations. We also discuss how square-pooling and complex cell models can be viewed as a way to represent multiplicative interactions and thereby as a way to encode relations.

INDEX TERMS

Standards, Computational modeling, Training, Logic gates, Mathematical model, Image recognition, Learning systems,complex cells, Learning image relations, spatiotemporal features, mapping units, energy models

CITATION

R. Memisevic, "Learning to Relate Images",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.35, no. 8, pp. 1829-1846, Aug. 2013, doi:10.1109/TPAMI.2013.53REFERENCES

- [1] G.F. Hinton, "A Parallel Computation That Assigns Canonical Object-Based Frames of Reference,"
Proc. Seventh Int'l Joint Conf. Artificial Intelligence, vol. 2, pp. 683-685, 1981.- [2] C. von der Malsburg, "The Correlation Theory of Brain Function,"
Models of Neural Networks II, E. Domany, J.L. van Hemmen, and K. Schulten, eds., chapter 2, pp. 95-119, Springer-Verlag, 1994.- [3] E. Adelson and J. Bergen, "Spatiotemporal Energy Models for the Perception of Motion,"
J. Optical Soc. Am. A, vol. 2, no. 2, pp. 284-299, 1985.- [4] P. Smolensky, "Information Processing in Dynamical Systems: Foundations of Harmony Theory,"
Parallel Distributed Processing: Explorations in the Microstructure of Cognition, D.E. Rumelhart, J.L. McClelland, and C. PDP Research Group, eds., vol. 1, pp. 194-281, MIT Press, 1986.- [5] A. Coates, H. Lee, and A.Y. Ng, "An Analysis of Single-Layer Networks in Unsupervised Feature Learning,"
Proc. 14th Int'l Conf. Artificial Intelligence and Statistics, 2011.- [6] B. Olshausen and D. Field, "Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images,"
Nature, vol. 381, no. 6583, pp. 607-609, June 1996.- [7] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and Composing Robust Features with Denoising Autoencoders,"
Proc. 25th Int'l Conf. Machine Learning, 2008.- [8] G.E. Hinton, "Training Products of Experts by Minimizing Contrastive Divergence,"
Neural Computation, vol. 14, no. 8, pp. 1771-1800, 2002.- [9] G. Hinton, "A Practical Guide to Training Restricted Boltzmann Machines," technical report, 2010.
- [10] A. Hyvärinen, J. Hurri, and P.O. Hoyer,
Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. Springer Verlag, 2009.- [11] G.E. Hinton, S. Osindero, and Y.-W. Teh, "A Fast Learning Algorithm for Deep Belief Nets,"
Neural Computation, vol. 18, pp. 1527-1554, July 2006.- [12] N. Troje and H. Bülthoff, "Face Recognition under Varying Poses: The Role of Texture and Shape,"
Vision Research, vol. 36, no. 12, pp. 1761-1771, 1996.- [13] C. Zetzsche and U. Nuding, "Nonlinear and Higher-Order Approaches to the Encoding of Natural Scenes,"
Network, vol. 16, no. 2/3, pp. 191-221, 2005.- [14] B. Olshausen, C. Cadieu, J. Culpepper, and D. Warland, "Bilinear Models of Natural Images,"
Proc. SPIE Human Vision Electronic Imaging XII, vol. 6492, 2007.- [15] D. Grimes and R. Rao, "Bilinear Sparse Coding for Invariant Vision,"
Neural Computation, vol. 17, no. 1, pp. 47-73, 2005.- [16] R. Memisevic and G. Hinton, "Unsupervised Learning of Image Transformations,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.- [17] J. Tenenbaum and W. Freeman, "Separating Style and Content with Bilinear Models,"
Neural Computation, vol. 12, no. 6, pp. 1247-1283, 2000.- [18] R. Memisevic, "Non-Linear Latent Factor Models for Revealing Structure in High-Dimensional Data," PhD dissertation, Univ. of Toronto, 2008.
- [19] R. Memisevic, "Gradient-Based Learning of Higher-Order Image Features,"
Proc. IEEE Int'l Conf. Computer Vision, 2011.- [20] J. Susskind, R. Memisevic, G. Hinton, and M. Pollefeys, "Modeling the Joint Density of Two Images under a Variety of Transformations,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.- [21] M. Ranzato, A. Krizhevsky, and G.E. Hinton, "Factored 3-Way Restricted Boltzmann Machines for Modeling Natural Images,"
Proc. 13th Int'l Conf. Artificial Intelligence and Statistics, 2010.- [22] Y. Karklin and M.S. Lewicki, "Is Early Vision Optimized for Extracting Higher-Order Dependencies?"
Proc. Advances in Neural Information Processing Systems 18, 2006.- [23] A. Hyvärinen and P. Hoyer, "Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces,"
Neural Computation, vol. 12, pp. 1705-1720, July 2000.- [24] I. Ohzawa, G.C. Deangelis, and R.D. Freeman, "Stereoscopic Depth Discrimination in the Visual Cortex: Neurons Ideally Suited as Disparity Detectors,"
Science, vol. 249, no. 4972, pp. 1037-1041, Aug. 1990.- [25] T. Sanger, "Stereo Disparity Computation Using Gabor Filters,"
Biological Cybernetics, vol. 59, pp. 405-418, 1988, doi: 10.1007/BF00336114. - [26] N. Qian, "Computing Stereo Disparity and Motion with Known Binocular Cell Properties,"
Neural Computation, vol. 6, pp. 390-404, May 1994.- [27] S. Becker and G.E. Hinton, "A Self-Organizing Neural Network That Discovers Surfaces in Random-Dot Stereograms,"
Nature, vol. 355, pp. 161-163, 1992.- [28] D. Fleet, H. Wagner, and D. Heeger, "Neural Encoding of Binocular Disparity: Energy Models, Position Shifts and Phase Shifts,"
Vision Research, vol. 36, no. 12, pp. 1839-1857, June 1996.- [29] C.L. Giles and T. Maxwell, "Learning, Invariance, and Generalization in High-Order Neural Networks,"
Applied Optics, vol. 26, no. 23, pp. 4972-4978, Dec. 1987.- [30] D.E. Rumelhart, G.E. Hinton, and J.L. Mcclelland, "A General Framework for Parallel Distributed Processing,"
Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, chapter 2, pp. 45-76, MIT Press, 1986.- [31] P. Smolensky, "Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems,"
Artificial Intelligence, vol. 46, pp. 159-216, 1990.- [32] T. Plate, "Holographic Reduced Representations: Convolution Algebra for Compositional Distributed Representations,"
Proc. Int'l Joint Conf. Artificial Intelligence, pp. 30-35, 1991.- [33] T. Kohonen, "The Adaptive-Subspace SOM (ASSOM) and Its Use for the Implementation of Invariant Feature Detection,"
Proc. Int'l Conf. Artificial Neural Networks, vol. I. EC2, pp. 3-10, 1995.- [34] A. Hyvärinen, P.O. Hoyer, and M. Inki, "Topographic ICA as a Model of Natural Image Statistics,"
Proc. First IEEE Int'l Workshop Biologically Motivated Computer Vision, S.-W. Lee, H.H. Blthoff, and T. Poggio, eds., pp. 535-544, 2000.- [35] M. Welling, G.E. Hinton, and S. Osindero, "Learning Sparse Topographic Representations with Products of Student-t Distributions,"
Proc. Advances in Neural Information Processing Systems, 2002.- [36] B. Olshausen, "Neural Routing Circuits for Forming Invariant Representations of Visual Objects," PhD dissertation, Computation and Neural Systems, California Inst. of Tech nology, 1994.
- [37] M.J. Wainwright and E.P. Simoncelli, "Scale Mixtures of Gaussians and the Statistics of Natural Images,"
Proc. Advances in Neural Information Processing Systems, vol. 12, pp. 855-861, 2000.- [38] P. Hoyer and A. Hyvärinen, "A Multi-Layer Sparse Coding Network Learns Contour Coding from Natural Images,"
Vision Research, vol. 42, pp. 1593-1605, 2002.- [39] M. Ranzato and G.E. Hinton, "Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2551-2558, 2010.- [40] A. Courville, J. Bergstra, and Y. Bengio, "A Spike and Slab Restricted Boltzmann Machine,"
Proc. Conf. Artificial Intelligence and Statistics, 2011.- [41] R. Memisevic and G.E. Hinton, "Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines,"
Neural Computation, vol. 22, no. 6, pp. 1473-92, 2010.- [42] J.L. Gallant, J. Braun, and D.C.V. Essen, "Selectivity for Polar, Hyperbolic, and Cartesian Gratings in Macaque Visual Cortex,"
Science, vol. 259, pp. 1001-1004, 1993.- [43] F. Bauer, "Motion Analysis Using Local Multiplicative Interactions," master's thesis, Institut für Informatik, 2012.
- [44] Q. Le, W. Zou, S. Yeung, and A. Ng, "Learning Hierarchical Spatio-Temporal Features for Action Recognition with Independent Subspace Analysis,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.- [45] G. Taylor, R. Fergus, Y. LeCun, and C. Bregler, "Convolutional Learning of Spatio-Temporal Features,"
Proc. European Conf. Computer Vision, 2010.- [46] R.A. Horn and C.R. Johnson,
Matrix Analysis. Cambridge Univ. Press, 1990.- [47] R.M. Gray, "Toeplitz and Circulant Matrices: A Review,"
Comm. Information Theory, vol. 2, pp. 155-239, Aug. 2005.- [48] M. Bethge, S. Gerwinn, and J. Macke, "Unsupervised Learning of a Steerable Basis for Invariant Image Representations,"
Proc. SPIE Human Vision and Electronic Imaging XII, pp. 1-12, Feb. 2007.- [49] R. Memisevic, "On Multi-View Feature Learning,"
Proc. Int'l Conf. Machine Learning, June 2012.- [50] R. Memisevic, C. Zach, G. Hinton, and M. Pollefeys, "Gated Softmax Classification,"
Proc. Advances in Neural Information Processing Systems 22, 2010.- [51] W.Y. Zou, S. Zhu, A.Y. Ng, and K. Yu, "Deep Learning of Invariant Features via Tracked Video Sequences,"
Proc. Advances in Neural Information Processing Systems 25, 2012.- [52] C.F. Cadieu and B.A. Olshausen, "Learning Intermediate-Level Representations of Form and Motion from Natural Movies,"
Neural Computation, vol. 24, no. 4, pp. 827-866, Dec. 2011.- [53] D.A. Ross, S. Osindero, and R.S. Zemel, "Combining Discriminative Features to Infer Complex Trajectories,"
Proc. 23rd Int'l Conf. Machine Learning, pp. 761-768, 2006.- [54] M. Denil, L. Bazzani, H. Larochelle, and N. de Freitas, "Learning Where to Attend with Deep Architectures for Image Tracking,"
Neural Computation, vol. 24, no. 8, pp. 2151-2184, http://dx.doi.org/10.1162NECO_a_00312, Aug. 2012.- [55] H. Larochelle and G. Hinton, "Learning to Combine Foveal Glimpses with a Third-Order Boltzmann Machine,"
Proc. Advances in Neural Information Processing Systems 23, pp. 1243-1251, 2010.- [56] G. Taylor and G. Hinton, "Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style,"
Proc. 26th Int'l Conf. Machine Learning, L. Bottou and M. Littman, eds., pp. 1025-1032, June 2009.- [57] G. Taylor, L. Sigal, D. Fleet, and G. Hinton, "Dynamic Binary Latent Variable Models for 3D Pose Tracking,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.- [58] Y. Tang, R. Salakhutdinov, and G. Hinton, "Robust Boltzmann Machines for Recognition and Denoising,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012.- [59] I. Sutskever, J. Martens, and G. Hinton, "Generating Text with Recurrent Neural Networks,"
Proc. 28th Int'l Conf. Machine Learning, L. Getoor and T. Scheffer, eds., pp. 1017-1024, June 2011.- [60] K.A. Archie and B.W. Mel, "A Model for Intradendritic Computation of Binocular Disparity,"
Nature Neuroscience, vol. 3, no. 1, pp. 54-63, Jan. 2000.- [61] A. Krizhevsky, I. Sutskever, and G.E. Hinton, "Imagenet Classification with Deep Convolutional Neural Networks,"
Proc. Advances in Neural Information Processing Systems 25, 2012.- [62] R.I. Hartley and A. Zisserman,
Multiple View Geometry in Computer Vision, second ed. Cambridge Univ. Press, 2004.- [63] K. Mikolajczyk and C. Schmid, "A Performance Evaluation of Local Descriptors,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, Oct. 2005.- [64] D.R. Hofstadter, "The Copycat Project: An Experiment in Nondeterminism and Creative Analogies," Technical Report AI Memo No. 755, MIT, 1984.
- [65] K. Funahashi, "On the Approximate Realization of Continuous Mappings by Neural Networks,"
Neural Networks, vol. 2, pp. 183-192, May 1989. |