The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - Sept. (2013 vol.35)
pp: 2206-2222
M. Ranzato , Dept. of Comput. Sci., Univ. of Toronto, Toronto, ON, Canada
V. Mnih , Dept. of Comput. Sci., Univ. of Toronto, Toronto, ON, Canada
J. M. Susskind , Machine Perception Lab., Univ. of California San Diego, La Jolla, CA, USA
G. E. Hinton , Dept. of Comput. Sci., Univ. of Toronto, Toronto, ON, Canada
ABSTRACT
This paper describes a Markov Random Field for real-valued image modeling that has two sets of latent variables. One set is used to gate the interactions between all pairs of pixels, while the second set determines the mean intensities of each pixel. This is a powerful model with a conditional distribution over the input that is Gaussian, with both mean and covariance determined by the configuration of latent variables, which is unlike previous models that were restricted to using Gaussians with either a fixed mean or a diagonal covariance matrix. Thanks to the increased flexibility, this gated MRF can generate more realistic samples after training on an unconstrained distribution of high-resolution natural images. Furthermore, the latent variables of the model can be inferred efficiently and can be used as very effective descriptors in recognition tasks. Both generation and discrimination drastically improve as layers of binary latent variables are added to the model, yielding a hierarchical model called a Deep Belief Network.
INDEX TERMS
Image reconstruction, Probabilistic logic, Logic gates, Computational modeling, Covariance matrix, Vectors, Adaptation models,facial expression recognition, Gated MRF, natural images, deep learning, unsupervised learning, density estimation, energy-based model, Boltzmann machine, factored 3-way model, generative model, object recognition, denoising
CITATION
M. Ranzato, V. Mnih, J. M. Susskind, G. E. Hinton, "Modeling Natural Images Using Gated MRFs", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 9, pp. 2206-2222, Sept. 2013, doi:10.1109/TPAMI.2013.29
REFERENCES
[1] E. Simoncelli, "Statistical Modeling of Photographic Images," Handbook of Image and Video Processing, pp. 431-441, Academic Press, 2005.
[2] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2006.
[3] D. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, pp. 91-110, 2004.
[4] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[5] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, "Surf: Speeded Up Robust Features," Computer Vision and Image Understanding, vol. 110, pp. 346-359, 2008.
[6] A. Bosch, A. Zisserman, and X. Munoz, "Representing Shape with a Spatial Pyramid Kernel," Proc. Sixth ACM Int'l Conf. Image and Video Retrieval, 2007.
[7] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis. John Wiley & Sons, 2001.
[8] G. Hinton and R.R. Salakhutdinov, "Reducing the Dimensionality of Data with Neural Networks," Science, vol. 313, no. 5786, pp. 504-507, 2006.
[9] M. Ranzato and G. Hinton, "Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[10] M. Wainwright and E. Simoncelli, "Scale Mixtures of Gaussians and the Statistics of Natural Images," Proc. Advances in Neural Information Processing Systems, 2000.
[11] S. Roth and M. Black, "Fields of Experts: A Framework for Learning Image Priors," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[12] U. Schmidt, Q. Gao, and S. Roth, "A Generative Perspective on MRFs in Low-Level Vision," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[13] M. Ranzato, V. Mnih, and G. Hinton, "Generating More Realistic Images Using Gated MRF's," Proc. Advances in Neural Information Processing Systems, 2010.
[14] Y. Karklin and M. Lewicki, "Emergence of Complex Cell Properties by Learning to Generalize in Natural Scenes," Nature, vol. 457, pp. 83-86, 2009.
[15] U. Koster and A. Hyvarinen, "A Two-Layer ICA-Like Model Estimated by Score Matching," Proc. 17th Int'l Conf. Artificial Neural Networks, 2007.
[16] P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, "Extracting and Composing Robust Features with Denoising Autoencoders," Proc. Int'l Conf. Machine Learning, 2008.
[17] H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng, "Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations," Proc. Int'l Conf. Machine Learning, 2009.
[18] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, "What Is the Best Multi-Stage Architecture for Object Recognition?" Proc. IEEE Int'l Conf. Computer Vision, 2009.
[19] Q. Le, W. Zou, S. Yeung, and A. Ng, "Learning Hierarchical Spatio-Temporal Features for Action Recognition with Independent Subspace Analysis," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[20] G. Hinton, S. Osindero, and Y.-W. Teh, "A Fast Learning Algorithm for Deep Belief Nets," Neural Computation, vol. 18, pp. 1527-1554, 2006.
[21] M. Ranzato, A. Krizhevsky, and G. Hinton, "Factored 3-Way Restricted Boltzmann Machines for Modeling Natural Images," Proc. Conf. Artificial Intelligence and Statistics, 2010.
[22] M. Ranzato, J. Susskind, V. Mnih, and G. Hinton, "On Deep Generative Models with Applications to Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[23] J. DiCarlo, D. Zoccolan, and N.C. Rust, "How Does the Brain Solve Visual Object Recognition?" Neuron, vol. 73, no. 3, pp. 415-34, 2012.
[24] M. Ranzato, "Unsupervised Learning of Feature Hierarchies," PhD thesis, chapter 1, 2009.
[25] M.E. Tipping and C.M. Bishop, "Probabilistic Principal Component Analysis," J. Royal Statistical Soc., Series B, vol. 61, pp. 611-622, 1999.
[26] G. Young, "Maximum Likelihood Estimation and Factor Analysis," Psychometrika, vol. 6, no. 1, pp. 49-53, 1940.
[27] D. MacKay, "Maximum Likelihood and Covariant Algorithms for Independent Component Analysis," 1999.
[28] B.A. Olshausen and D.J. Field, "Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by v1?" Vision Research, vol. 37, pp. 3311-3325, 1997.
[29] M. Welling, M. Rosen-Zvi, and G. Hinton, "Exponential Family Harmoniums with an Application to Information Retrieval," Proc. Advances in Neural Information Processing Systems, 2005.
[30] M. Welling, G. Hinton, and S. Osindero, "Learning Sparse Topographic Representations with Products of Student-t Distributions," Proc. Advances in Neural Information Processing Systems, 2003.
[31] Y.W. Teh, M. Welling, S. Osindero, and G.E. Hinton, "Energy-Based Models for Sparse Overcomplete Representations," J. Machine Learning Research, vol. 4, pp. 1235-1260, 2003.
[32] T. Sejnowski, "Higher-Order Boltzmann Machines," Proc. AIP Conf. Neural Networks for Computing, 1986.
[33] R. Memisevic and G. Hinton, "Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines," Neural Computation, vol. 22, pp. 1473-1492, 2009.
[34] M. Welling and G.E. Hinton, "A New Learning Algorithm for Mean Field Boltzmann Machines," Proc. Int'l Conf. Artificial Neural Networks, 2002.
[35] S. Osindero and G.E. Hinton, "Modeling Image Patches with a Directed Hierarchy of Markov Random Fields," Proc. Advances in Neural Information Processing Systems, 2008.
[36] G. Taylor, G. Hinton, and S. Roweis, "Modeling Human Motion Using Binary Latent Variables," Proc. Advances in Neural Information Processing Systems, 2007.
[37] Y. Weiss and W. Freeman, "What Makes a Good Model of Natural Images?" Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[38] C. Williams and F. Agakov, "Products of Gaussians and Probabilistic Minor Component Analysis," Neural Computation, vol. 14, pp. 1169-1182, 2002.
[39] S. Geman and D. Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721-741, Nov. 1984.
[40] M. Black and A. Rangarajan, "On the Unification of Line Processes, Outlier Rejection, and Robust Statistics with Applications in Early Vision," Int'l J. Computer Vision, vol. 19, no. 1, pp. 57-92, 1996.
[41] G. Hinton and Y. Teh, "Discovering Multiple Constraints That Are Frequently Approximately Satisfied," Proc. 17th Conf. Uncertainty and Artificial Intelligence, 2001.
[42] G. Hinton, "Products of Experts," Proc. Ninth Int'l Conf. Artificial Neural Networks, 1999.
[43] R. Neal, Bayesian Learning for Neural Networks. Springer-Verlag, 1996.
[44] T. Tieleman and G. Hinton, "Using Fast Weights to Improve Persistent Contrastive Divergence," Proc. Int'l Conf. Machine Learning, 2009.
[45] M. Zontak and M. Irani, "Internal Statistics of a Single Natural Image," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[46] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[47] K. Gregor and Y. LeCun, "Emergence of Complex-Like Cells in a Temporal Product Network with Local Receptive Fields," arXiv:1006.0448, 2010.
[48] Q. Le, J. Ngiam, Z. Chen, D. Chia, P. Koh, and A. Ng, "Tiled Convolutional Neural Networks," Proc. Advances in Neural Information Processing Systems, 2010.
[49] S. Zhu and D. Mumford, "Prior Learning and Gibbs Reaction Diffusion," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 11, pp. 1236-1250, Nov. 1997.
[50] I. Murray and R. Salakhutdinov, "Evaluating Probabilities under High-Dimensional Latent Variable Models," Proc. Advances in Neural Information Processing Systems, 2009.
[51] L. Theis, S. Gerwinn, F. Sinz, and M. Bethge, "In All Likelihood, Deep Belief Is Not Enough," J. Machine Learning Research, vol. 12, pp. 3071-3096, 2011.
[52] M.A. Carreira-Perpignan and G.E. Hinton, "On Contrastive Divergence Learning," Proc. Int'l Workshop Artificial Intelligence and Statistics, 2005.
[53] T. Tieleman, "Training Restricted Boltzmann Machines Using Approximations to the Likelihood Gradient," Proc. Int'l Conf. Machine Learning, 2008.
[54] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, "ImageNet: A Large-Scale Hierarchical Image Database," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[55] A. Buades, B. Coll, and J. Morel, "A Non Local Algorithm for Image Denoising," Proc. IEEE Computer Vision and Pattern Recognition, 2005.
[56] J. Portilla, V. Strela, M. Wainwright, and E. Simoncelli, "Image Denoising Using Scale Mixtures of Gaussians in the Wavelet Domain," IEEE Trans. Image Processing, vol. 12, no. 11, pp. 1338-1351, Nov. 2003.
[57] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, "Image Denoising with Block-Matching and 3D Filtering," Proc. SPIE Electronic Imaging, 2006.
[58] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Non-Local Sparse Models for Image Restoration," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[59] M. Elad and M. Aharon, "Image Denoising via Learned Dictionaries and Sparse Representation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[60] A. Krizhevsky, "Learning Multiple Layers of Features from Tiny Images," MSc thesis, Dept. of Computer Science, Univ. of Toronto, 2009.
[61] A. Torralba, R. Fergus, and W. Freeman, "80 Million Tiny Images: A Large Data Set for Non-Parametric Object and Scene Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 1958-1970, Nov. 2008.
[62] A. Oliva and A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope," Int'l J. Computer Vision, vol. 42, pp. 145-175, 2001.
[63] M. Ranzato, F. Huang, Y. Boureau, and Y. LeCun, "Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[64] R. Raina, A. Battle, H. Lee, B. Packer, and A. Ng, "Self-Taught Learning: Transfer Learning from Unlabeled Data," Proc. Int'l Conf. Machine Learning, 2007.
[65] D. Ciresan, U. Meier, J. Masci, L. Gambardella, and J. Schmidhuber, "Flexible, High Performance Convolutional Neural Networks for Image Classification," Proc. 28th Int'l Joint Conf. Artificial Intelligence, 2011.
[66] J.M. Susskind, A.K. Anderson, and G.E. Hinton, "The Toronto Face Database," technical report, Dept. of Computer Science, Univ. of Toronto, 2010.
[67] B. Fasel, I. Fortenberry, and J. Movellan, "A Generative Framework for Real-Time Object Detection and Classification," Computer Vision and Image Understanding, vol. 98, pp. 182-210, 2005.
[68] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, "Robust Face Recognition via Sparse Representation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, Feb. 2009.
[69] M. Dailey, G. Cottrell, R. Adolphs, and C. Padgett, "Empath: A Neural Network that Categorizes Facial Expressions," J. Cognitive Neuroscience, vol. 14, pp. 1158-1173, 2002.
[70] G. Littlewort, M. Bartlett, I. Fasel, J. Susskind, and J. Movellan, "Dynamics of Facial Expression Extracted Automatically from Video," Proc. IEEE Computer Vision and Pattern Recognition Workshop, vol. 5, p. 80, 2004.
[71] G. Hinton, A. Krizhevsky, and S. Wang, "Learning Structural Descriptions of Objects Using Equivariant Capsules," Proc. Int'l Conf. Artificial Neural Networks, 2011.
81 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool