The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.08 - Aug. (2013 vol.35)
pp: 1958-1971
R. Salakhutdinov , Dept. of Stat. & Comput. Sci., Univ. of Toronto, Toronto, ON, Canada
J. B. Tenenbaum , Dept. of Brain & Cognitive Sci., Massachusetts Inst. of Technol., Cambridge, MA, USA
A. Torralba , Comput. Sci. & Artificial Intell. Lab., Massachusetts Inst. of Technol., Cambridge, MA, USA
ABSTRACT
We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training example by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets.
INDEX TERMS
Approximation methods, Machine learning, Stochastic processes, Computational modeling, Vectors, Bayesian methods, Training,one-shot learning, Deep networks, deep Boltzmann machines, hierarchical Bayesian models
CITATION
R. Salakhutdinov, J. B. Tenenbaum, A. Torralba, "Learning with Hierarchical-Deep Models", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 8, pp. 1958-1971, Aug. 2013, doi:10.1109/TPAMI.2012.269
REFERENCES
[1] B. Babenko, S. Branson, and S.J. Belongie, "Similarity Functions for Categorization: From Monolithic to Category Specific," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[2] E. Bart, I. Porteous, P. Perona, and M. Welling, "Unsupervised Learning of Visual Taxonomies," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[3] E. Bart and S. Ullman, "Cross-Generalization: Learning Novel Classes from a Single Example by Feature Replacement," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 672-679, 2005.
[4] D.M. Blei, A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[5] D.M. Blei, T.L. Griffiths, and M.I. Jordan, "The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies," J. ACM, vol. 57, no. 2, 2010.
[6] K.R. Canini and T.L. Griffiths, "Modeling Human Transfer Learning with the Hierarchical Dirichlet Process," Proc. NIPS Workshop Nonparametric Bayes, 2009.
[7] C.-C. Chang and C.-J. Lin, "LIBSVM: A Library for Support Vector Machines," ACM Trans. Intelligent Systems and Technology, vol. 2, pp. 27:1-27:27, 2011.
[8] B. Chen, G. Polatkan, G. Sapiro, D.B. Dunson, and L. Carin, "The Hierarchical Beta Process for Convolutional Factor Analysis and Deep Learning," Proc. 28th Int'l Conf. Machine Learning, pp. 361-368, 2011.
[9] A. Coates, B. Carpenter, C. Case, S. Satheesh, B. Suresh, T. Wang, and A.Y. Ng, "Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning," Proc. 11th Int'l Conf. Document Analysis and Recognition, 2011.
[10] A. Courville, J. Bergstra, and Y. Bengio, "Unsupervised Models of Images by Spike-and-Slab RBNS," Proc. 28th Int'l Conf. Machine Learning, pp. 1145-1152, June 2011.
[11] L. Fei-Fei, R. Fergus, and P. Perona, "One-Shot Learning of Object Categories," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 594-611, Apr. 2006.
[12] G.E. Hinton, S. Osindero, and Y. Teh, "A Fast Learning Algorithm for Deep Belief Nets," Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.
[13] G.E. Hinton and R.R. Salakhutdinov, "Reducing the Dimensionality of Data with Neural Networks," Science, vol. 313, no. 5786, pp. 504-507, 2006.
[14] G.E. Hinton and T. Sejnowski, "Optimal Perceptual Inference," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1983.
[15] C. Kemp, A. Perfors, and J. Tenenbaum, "Learning Overhypotheses with Hierarchical Bayesian Models," Developmental Science, vol. 10, no. 3, pp. 307-321, 2006.
[16] A. Krizhevsky, "Learning Multiple Layers of Features from Tiny Images," 2009.
[17] A. Krizhevsky, "Learning Multiple Layers of Features from Tiny Images," technical report, Dept. of Computer Science, Univ. of Toronto, 2009.
[18] B. Lake, R. Salakhutdinov, J. Gross, and J. Tenenbaum, "One-Shot Learning of Simple Visual Concepts," Proc. 33rd Ann. Conf. Cognitive Science Soc., 2011.
[19] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, "Exploring Strategies for Training Deep Neural Networks." J. Machine Learning Research, vol. 10, pp. 1-40, 2009.
[20] H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng, "Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations," Proc. Int'l Conf. Machine Learning, pp. 609-616, 2009.
[21] H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng, "Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations," Proc. 26th Int'l Conf. Machine Learning, pp. 609-616, 2009.
[22] Y. Lin, T. Zhangi, S. Zhu, and K. Yu, "Deep Coding Networks," Proc. Advances in Neural Information Processing Systems Conf., vol. 23, 2011.
[23] A. Mohamed, G. Dahl, and G. Hinton, "Acoustic Modeling Using Deep Belief Networks," IEEE Trans. Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 14-22, Jan. 2012.
[24] V. Nair and G.E. Hinton, "Implicit Mixtures of Restricted Boltzmann Machines," Proc. Advances in Neural Information Processing Systems Conf., vol. 21, 2009.
[25] A. Perfors and J.B. Tenenbaum, "Learning to Learn Categories," Proc. 31st Ann. Conf. Cognitive Science Soc., pp. 136-141, 2009.
[26] M.A. Ranzato, Y. Boureau, and Y. LeCun, "Sparse Feature Learning for Deep Belief Networks," Proc. Advances in Neural Information Processing Systems, 2008.
[27] H. Robbins and S. Monro, "A Stochastic Approximation Method," Annals Math. Statistics, vol. 22, pp. 400-407, 1951.
[28] A. Rodriguez, D. Dunson, and A. Gelfand, "The Nested Dirichlet Process." J. Am. Statistical Assoc., vol. 103, pp. 1131-1144, 2008.
[29] R.R. Salakhutdinov and G.E. Hinton, "Deep Boltzmann Machines," Proc. Int'l Conf. Artificial Intelligence and Statistics, vol. 12, 2009.
[30] R.R. Salakhutdinov and G.E. Hinton, "Replicated Softmax: An Undirected Topic Model," Proc. Advances in Neural Information Processing Systems Conf., vol. 22, 2010.
[31] L.B. Smith, S.S. Jones, B. Landau, L. Gershkoff-Stowe, and L. Samuelson, "Object Name Learning Provides On-the-Job Training for Attention," Psychological Science, vol. 13, pp. 13-19, 2002.
[32] R. Socher, C. Lin, A.Y. Ng, and C. Manning, "Parsing Natural Scenes and Natural Language with Recursive Neural Networks," Proc. 28th Int'l Conf. Machine Learning, 2011.
[33] E.B. Sudderth, A. Torralba, W.T. Freeman, and A.S. Willsky, "Describing Visual Scenes Using Transformed Objects and Parts," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 291-330, 2008.
[34] G. Taylor, G.E. Hinton, and S.T. Roweis, "Modeling Human Motion Using Binary Latent Variables," Proc. Advances in Neural Information Processing Systems Conf., 2006.
[35] G.W. Taylor, R. Fergus, Y. LeCun, and C. Bregler, "Convolutional Learning of Spatio-Temporal Features," Proc. 11th European Conf. Computer Vision, 2010.
[36] Y.W. Teh and G.E. Hinton, "Rate-Coded Restricted Boltzmann Machines for Face Recognition," Proc. Advances in Neural Information Processing Systems Conf., vol. 13, 2001.
[37] Y.W. Teh and M.I. Jordan, "Hierarchical Bayesian Nonparametric Models with Applications," Bayesian Nonparametrics: Principles and Practice, Cambridge Univ. Press, 2010.
[38] Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei, "Hierarchical Dirichlet Processes," J. Am. Statistical Assoc., vol. 101, no. 476, pp. 1566-1581, 2006.
[39] T. Tieleman, "Training Restricted Boltzmann Machines Using Approximations to the Likelihood Gradient," Proc. 25th Int'l Conf. Machine Learning, 2008.
[40] A. Torralba, R. Fergus, and W.T. Freeman, "80 Million Tiny Images: A Large Data Set for Non-Parametric Object and Scene Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958-1970, Nov. 2008.
[41] A. Torralba, R. Fergus, and Y. Weiss, "Small Codes and Large Image Databases for Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[42] A.B Torralba, K.P Murphy, and W.T. Freeman, "Shared Features for Multiclass Object Detection," Toward Category-Level Object Recognition, pp. 345-361, 2006.
[43] P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, "Extracting and Composing Robust Features with Denoising Autoencoders," Proc. 25th Int'l Conf. Machine Learning, vol. 307, pp. 1096-1103, 2008.
[44] F. Xu and J.B. Tenenbaum, "Word Learning as Bayesian Inference," Psychological Rev., vol. 114, no. 2, pp. 245-272, 2007.
[45] L. Younes, "Parameter Inference for Imperfectly Observed Gibbsian Fields," Probability Theory Related Fields, vol. 82, pp. 625-645, 1989.
[46] L. Younes, "On the Convergence of Markovian Stochastic Algorithms with Rapidly Decreasing Ergodicity Rates," Mar. 2000.
[47] A.L. Yuille, "The Convergence of Contrastive Divergences," Proc. Advances in Neural Information Processing Systems Conf., 2004.
62 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool