The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.08 - Aug. (2013 vol.35)
pp: 1798-1828
Y. Bengio , Dept. of Comput. Sci. & Oper. Res., Univ. de Montreal, Montreal, QC, Canada
A. Courville , Dept. of Comput. Sci. & Oper. Res., Univ. de Montreal, Montreal, QC, Canada
P. Vincent , Dept. of Comput. Sci. & Oper. Res., Univ. de Montreal, Montreal, QC, Canada
ABSTRACT
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
INDEX TERMS
Learning systems, Machine learning, Abstracts, Feature extraction, Manifolds, Neural networks, Speech recognition,neural nets, Deep learning, representation learning, feature learning, unsupervised learning, Boltzmann machine, autoencoder
CITATION
Y. Bengio, A. Courville, P. Vincent, "Representation Learning: A Review and New Perspectives", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 8, pp. 1798-1828, Aug. 2013, doi:10.1109/TPAMI.2013.50
REFERENCES
[1] G. Alain and Y. Bengio, "What Regularized Auto-Encoders Learn from the Data Generating Distribution," Technical Report, Arxiv report 1211.4246, Univ. de Montréal, 2012.
[2] S. Amari, "Natural Gradient Works Efficiently in Learning," Neural Computation, vol. 10, no. 2, pp. 251-276, 1998.
[3] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, "Structured Sparsity through Convex Optimization," Statistical Science, vol. 27, pp. 450-468, 2012.
[4] J.A. Bagnell and D.M. Bradley, "Differentiable Sparse Coding," Proc. Neural Information and Processing Systems, pp. 113-120, 2009.
[5] H. Baird, "Document Image Defect Models," Proc. IAPR Workshop, Syntactic and Structural Patten Recognition, pp. 38-46, 1990.
[6] S. Becker and G. Hinton, "A Self-Organizing Neural Network That Discovers Surfaces in Random-Dot Stereograms," Nature, vol. 355, pp. 161-163, 1992.
[7] M. Belkin and P. Niyogi, "Laplacian Eigenmaps for Dimensionality Reduction and Data Representation," Neural Computation, vol. 15, no. 6, pp. 1373-1396, 2003.
[8] A. Bell and T.J. Sejnowski, "The Independent Components of Natural Scenes Are Edge Filters," Vision Research, vol. 37, pp. 3327-3338, 1997.
[9] Y. Bengio, "A Connectionist Approach to Speech Recognition," Int'l J. Pattern Recognition and Artificial Intelligence, vol. 7, no. 4, pp. 647-668, 1993.
[10] Y. Bengio, "Neural Net Language Models," Scholarpedia, vol. 3, no. 1, 2008.
[11] Y. Bengio, "Learning Deep Architectures for AI," Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1-127, 2009.
[12] Y. Bengio, "Deep Learning of Representations for Unsupervised and Transfer Learning," JMLR Workshops and Conf. Proc., vol. 27, pp. 17-36, 2012.
[13] Y. Bengio, "Practical Recommendations for Gradient-Based Training of Deep Architectures," Neural Networks: Tricks of the Trade, K.-R. Müller, G. Montavon, and G.B. Orr, eds., Springer 2013.
[14] Y. Bengio and O. Delalleau, "Justifying and Generalizing Contrastive Divergence," Neural Computation, vol. 21, no. 6, pp. 1601-1621, 2009.
[15] Y. Bengio and O. Delalleau, "On the Expressive Power of Deep Architectures," Proc. Int'l Conf. Algorithmic Learning Theory, 2011.
[16] Y. Bengio and Y. LeCun, "Scaling Learning Algorithms Towards AI," Large Scale Kernel Machines, L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds., MIT Press, 2007.
[17] Y. Bengio and M. Monperrus, "Non-Local Manifold Tangent Learning," Proc. Neural Information and Processing Systems, pp. 129-136, 2004.
[18] Y. Bengio, P. Simard, and P. Frasconi, "Learning Long-Term Dependencies with Gradient Descent Is Difficult," IEEE Trans. Neural Networks, vol. 5, no. 2, pp. 157-166, Mar. 1994.
[19] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, "A Neural Probabilistic Language Model," J. Machine Learning Research, vol. 3, pp. 137-1155, 2003.
[20] Y. Bengio, J.-F. Paiement, P. Vincent, O. Delalleau, N. Le Roux, and M. Ouimet, "Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering," Proc. Neural Information and Processing Systems, 2003.
[21] Y. Bengio, O. Delalleau, and N. Le Roux, "The Curse of Highly Variable Functions for Local Kernel Machines," Proc. Neural Information and Processing Systems, 2005.
[22] Y. Bengio, H. Larochelle, and P. Vincent, "Non-Local Manifold Parzen Windows," Proc. Neural Information and Processing Systems, 2005.
[23] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy Layer-Wise Training of Deep Networks," Proc. Neural Information and Processing Systems, 2006.
[24] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, "Curriculum Learning," Proc. Int'l Conf. Machine Learning, 2009.
[25] Y. Bengio, O. Delalleau, and C. Simard, "Decision Trees Do Not Generalize to New Variations," Computational Intelligence, vol. 26, no. 4, pp. 449-467, 2010.
[26] Y. Bengio, G. Alain, and S. Rifai, "Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders," Technical Report, arXiv:1207.0057, 2012.
[27] Y. Bengio, G. Mesnil, Y. Dauphin, and S. Rifai, "Better Mixing via Deep Representations," Proc. Int'l Conf. Machine Learning, 2013.
[28] J. Bergstra and Y. Bengio, "Slow, Decorrelated Features for Pretraining Complex Cell-Like Networks," Proc. Neural Information and Processing Systems, 2009.
[29] J. Bergstra and Y. Bengio, "Random Search for Hyper-Parameter Optimization," J. Machine Learning Research, vol. 13, pp. 281-305, 2012.
[30] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, "Algorithms for Hyper-Parameter Optimization," Proc. Neural Information and Processing Systems, 2011.
[31] P. Berkes and L. Wiskott, "Slow Feature Analysis Yields a Rich Repertoire of Complex Cell Properties," J. Vision, vol. 5, no. 6, pp. 579-602, 2005.
[32] J. Besag, "Statistical Analysis of Non-Lattice Data," The Statistician, vol. 24, no. 3, pp. 179-195, 1975.
[33] A. Bordes, X. Glorot, J. Weston, and Y. Bengio, "Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing," Proc. Int'l Conf. Artificial Intelligence and Statistics, 2012.
[34] N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, "Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription," Proc. Int'l Conf. Machine Learning, 2012.
[35] Y. Boureau, J. Ponce, and Y. LeCun, "A Theoretical Analysis of Feature Pooling in Vision Algorithms," Proc. Int'l Conf. Machine Learning, 2010.
[36] Y. Boureau, N. Le Roux, F. Bach, J. Ponce, and Y. LeCun, "Ask the Locals: Multi-Way Local Pooling for Image Recognition," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[37] H. Bourlard and Y. Kamp, "Auto-Association by Multilayer Perceptrons and Singular Value Decomposition," Biological Cybernetics, vol. 59, pp. 291-294, 1988.
[38] M. Brand, "Charting a Manifold," Proc. Neural Information and Processing Systems, pp. 961-968, 2002.
[39] O. Breuleux, Y. Bengio, and P. Vincent, "Quickly Generating Representative Samples from an RBM-Derived Process," Neural Computation, vol. 23, no. 8, pp. 2053-2073, 2011.
[40] J. Bruna and S. Mallat, "Classification with Scattering Operators," Proc. Int'l Conf. Pattern Recognition, 2011.
[41] C. Cadieu and B. Olshausen, "Learning Transformational Invariants from Natural Movies," Proc. Neural Information and Processing Systems, pp. 209-216, 2009.
[42] M.A. Carreira-Perpiñan and G.E. Hinton, "On Contrastive Divergence Learning," Proc. Int'l Workshop Artificial Intelligence and Statistics, pp. 33-40, 2005.
[43] M. Chen, Z. Xu, K.Q. Winberger, and F. Sha, "Marginalized Denoising Autoencoders for Domain Adaptation," Proc. Int'l Conf. Machine Learning, 2012.
[44] K. Cho, T. Raiko, and A. Ilin, "Parallel Tempering Is Efficient for Learning Restricted Boltzmann Machines," Proc. Int'l Joint Conf. Neural Networks, 2010.
[45] K. Cho, T. Raiko, and A. Ilin, "Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines," Proc. Int'l Conf. Machine Learning, pp. 105-112, 2011.
[46] D. Ciresan, U. Meier, and J. Schmidhuber, "Multi-Column Deep Neural Networks for Image Classification," Technical Report, arXiv:1202.2745, 2012.
[47] D.C. Ciresan, U. Meier, L.M. Gambardella, and J. Schmidhuber, "Deep Big Simple Neural Nets for Handwritten Digit Recognition," Neural Computation, vol. 22, pp. 1-14, 2010.
[48] A. Coates and A.Y. Ng, "The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization," Proc. Int'l Conf. Machine Learning, 2011.
[49] A. Coates and A.Y. Ng, "Selecting Receptive Fields in Deep Networks," Proc. Neural Information and Processing Systems, 2011.
[50] R. Collobert and J. Weston, "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning," Proc. Int'l Conf. Machine Learning, 2008.
[51] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural Language Processing (almost) from Scratch," J. Machine Learning Research, vol. 12, pp. 2493-2537, 2011.
[52] A. Courville, J. Bergstra, and Y. Bengio, "A Spike and Slab Restricted Boltzmann Machine," Proc. Int'l Conf. Artificial Intelligence and Statistics, 2011.
[53] A. Courville, J. Bergstra, and Y. Bengio, "Unsupervised Models of Images by Spike-and-Slab RBMs," Proc. Int'l Conf. Machine Learning, 2011.
[54] G.E. Dahl, M. Ranzato, A. Mohamed, and G.E. Hinton, "Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine," Proc. Neural Information and Processing Systems, 2010.
[55] G.E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-Dependent Pre-Trained Deep Neural Networks for Large Vocabulary Speech Recognition," IEEE Trans. Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 33-42, Jan. 2012.
[56] L. Deng, M. Seltzer, D. Yu, A. Acero, A. Mohamed, and G. Hinton, "Binary Coding of Speech Spectrograms Using a Deep Auto-Encoder," Proc. Ann. Conf. Int'l Speech Comm. Assoc., 2010.
[57] G. Desjardins and Y. Bengio, "Empirical Evaluation of Convolutional RBMs for Vision," Technical Report 1327, Dept. IRO, Univ. of Montréal, 2008.
[58] G. Desjardins, A. Courville, Y. Bengio, P. Vincent, and O. Delalleau, "Tempered Markov Chain Monte Carlo for Training of Restricted Boltzmann Machine," Proc. Conf. Artificial Intelligence and Statistics, vol. 9, pp. 145-152, 2010.
[59] G. Desjardins, A. Courville, and Y. Bengio, "On Tracking the Partition Function," Proc. Neural Information and Processing Systems, 2011.
[60] G. Desjardins, A. Courville, and Y. Bengio, "On Training Deep Boltzmann Machines," Technical Report, arXiv:1203.4416v1, Univ. of Montréal, 2012.
[61] J. DiCarlo, D. Zoccolan, and N. Rust, "How Does the Brain Solve Visual Object Recognition?" Neuron, vol. 73, pp. 415-434, 2012.
[62] D.L. Donoho and C. Grimes, "Hessian Eigenmaps: New Locally Linear Embedding Techniques for High-Dimensional Data," Technical Report 2003-08, Dept. of Statistics, Stanford Univ., 2003.
[63] J. Eisner, "Learning Approximate Inference Policies for Fast Prediction," Proc. ICML Workshop Interactions between Search and Learning, 2012.
[64] D. Erhan, A. Courville, and Y. Bengio, "Understanding Representations Learned in Deep Architectures," Technical Report 1355, Univ. of Montréal/DIRO, 2010.
[65] D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio, "Why Does Unsupervised Pre-Training Help Deep Learning?" J. Machine Learning Research, vol. 11, pp. 625-660, 2010.
[66] Y. Freund and D. Haussler, "Unsupervised Learning of Distributions on Binary Vectors Using Two Layer Networks," Technical Report UCSC-CRL-94-25, Univ. of California, Santa Cruz, 1994.
[67] K. Fukushima, "Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position," Biological Cybernetics, vol. 36, pp. 193-202, 1980.
[68] X. Glorot and Y. Bengio, "Understanding the Difficulty of Training Deep Feedforward Neural Networks," Proc. Conf. Artificial Intelligence and Statistics, 2010.
[69] X. Glorot, A. Bordes, and Y. Bengio, "Deep Sparse Rectifier Neural Networks," Proc. Conf. Artificial Intelligence and Statistics, 2011.
[70] X. Glorot, A. Bordes, and Y. Bengio, "Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach," Proc. Int'l Conf. Machine Learning, 2011.
[71] I. Goodfellow, Q. Le, A. Saxe, and A. Ng, "Measuring Invariances in Deep Networks," Proc. Neural Information and Processing System, pp. 646-654, 2009.
[72] I. Goodfellow, A. Courville, and Y. Bengio, "Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery," Proc. NIPS Workshop Challenges in Learning Hierarchical Models, 2011.
[73] I.J. Goodfellow, A. Courville, and Y. Bengio, "Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery," arXiv:1201.3382, 2012.
[74] K. Gregor and Y. LeCun, "Emergence of Complex-Like Cells in a Temporal Product Network with Local Receptive Fields," Technical Report, arXiv:1006.0448, 2010.
[75] K. Gregor and Y. LeCun, "Learning Fast Approximations of Sparse Coding," Proc. Int'l Conf. Machine Learning, 2010.
[76] K. Gregor, A. Szlam, and Y. LeCun, "Structured Sparse Coding via Lateral Inhibition," Proc. Neural Information and Processing Systems, 2011.
[77] R. Gribonval, "Should Penalized Least Squares Regression Be Interpreted as Maximum A Posteriori Estimation?" IEEE Trans. Signal Processing, vol. 59, no. 5, pp. 2405-2410, May 2011.
[78] R. Grosse, R. Raina, H. Kwong, and A.Y. Ng, "Shift-Invariant Sparse Coding for Audio Classification," Proc. Conf. Uncertainty in Artificial Intelligence, 2007.
[79] A. Grubb and J.A.D. Bagnell, "Boosted Backpropagation Learning for Training Deep Modular Networks," Proc. Int'l Conf. Machine Learning, 2010.
[80] M. Gutmann and A. Hyvarinen, "Noise-Contrastive Estimation: A New Estimation Principle for Unnormalized Statistical Models," Proc. Conf. Artificial Intelligence and Statistics, 2010.
[81] P. Hamel, S. Lemieux, Y. Bengio, and D. Eck, "Temporal Pooling and Multiscale Learning for Automatic Annotation and Ranking of Music Audio," Proc. Int'l Conf. Music Information Retrieval, 2011.
[82] J. Håstad, "Almost Optimal Lower Bounds for Small Depth Circuits," Proc. 18th Ann. ACM Symp. Theory of Computing, pp. 6-20, 1986.
[83] J. Håstad and M. Goldmann, "On the Power of Small-Depth Threshold Circuits," Computational Complexity, vol. 1, pp. 113-129, 1991.
[84] M. Henaff, K. Jarrett, K. Kavukcuoglu, and Y. LeCun, "Unsupervised Learning of Sparse Features for Scalable Audio Classification," Proc. Int'l Conf. Music Information Retrieva, 2011.
[85] G. Hinton, A. Krizhevsky, and S. Wang, "Transforming Auto-Encoders," Proc. Int'l Conf. Artificial Neural Networks, 2011.
[86] G. Hinton, L. Deng, G.E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep Neural Networks for Acoustic Modeling in Speech Recognition," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, Nov. 2012.
[87] G.E. Hinton, "Learning Distributed Representations of Concepts," Proc. Eighth Conf. Cognitive Science Soc., pp. 1-12, 1986.
[88] G.E. Hinton, "Products of Experts," Proc. Int'l Conf. Artificial Neural Networks, 1999.
[89] G.E. Hinton, "Training Products of Experts by Minimizing Contrastive Divergence," Technical Report GCNU TR 2000-004, Gatsby Unit, Univ. College London, 2000.
[90] G.E. Hinton, "A Practical Guide to Training Restricted Boltzmann Machines," Technical Report UTML TR 2010-003, Dept. of Computer Science, Univ. of Toronto, 2010.
[91] G.E. Hinton and S. Roweis, "Stochastic Neighbor Embedding," Proc. Neural Information and Processing System, 2002.
[92] G.E. Hinton and R. Salakhutdinov, "Reducing the Dimensionality of Data with Neural Networks," Science, vol. 313, no. 5786, pp. 504-507, 2006.
[93] G.E. Hinton and R.S. Zemel, "Autoencoders, Minimum Description Length, and Helmholtz Free Energy," Proc. Neural Information and Processing Systems, 1993.
[94] G.E. Hinton, S. Osindero, and Y. Teh, "A Fast Learning Algorithm for Deep Belief Nets," Neural Computation, vol. 18, pp. 1527-1554, 2006.
[95] D.H. Hubel and T.N. Wiesel, "Receptive Fields of Single Neurons in the Cat's Striate Cortex," J. Physiology, vol. 148, pp. 574-591, 1959.
[96] J. Hurri and A. Hyvärinen, "Temporal Coherence, Natural Image Sequences, and the Visual Cortex," Proc. Neural Information and Processing Systems, 2002.
[97] A. Hyvärinen, "Estimation of Non-Normalized Statistical Models Using Score Matching," J. Machine Learning Research, vol. 6, pp. 695-709, 2005.
[98] A. Hyvärinen, "Some Extensions of Score Matching," Computational Statistics and Data Analysis, vol. 51, pp. 2499-2512, 2007.
[99] A. Hyvärinen, "Optimal Approximation of Signal Priors," Neural Computation, vol. 20, no. 12, pp. 3087-3110, 2008.
[100] A. Hyvärinen and P. Hoyer, "Emergence of Phase and Shift Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces," Neural Computation, vol. 12, no. 7, pp. 1705-1720, 2000.
[101] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis. Wiley, 2001.
[102] A. Hyvärinen, P.O. Hoyer, and M. Inki, "Topographic Independent Component Analysis," Neural Computation, vol. 13, no. 7, pp. 1527-1558, 2001.
[103] A. Hyvärinen, J. Hurri, and P.O. Hoyer, Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. Springer-Verlag, 2009.
[104] H. Jaeger, "Echo State Network," Scholarpedia, vol. 2, no. 9, p. 2330, 2007.
[105] V. Jain and S.H. Seung, "Natural Image Denoising with Convolutional Networks," Proc. Neural Information and Processing Systems, 2008.
[106] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, "What Is the Best Multi-Stage Architecture for Object Recognition?" Proc. IEEE Int'l Conf. Computer Vision, 2009.
[107] R. Jenatton, J.-Y. Audibert, and F. Bach, "Structured Variable Selection with Sparsity-Inducing Norms," Technical Report, arXiv:0904.3523, 2009.
[108] C. Jutten and J. Herault, "Blind Separation of Sources, Part I: An Adaptive Algorithm Based on Neuromimetic Architecture," Signal Processing, vol. 24, pp. 1-10, 1991.
[109] K. Kavukcuoglu, M. Ranzato, and Y. LeCun, "Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition," Technical Report CBLL-TR-2008-12-01, New York Univ., 2008.
[110] K. Kavukcuoglu, M.-A. Ranzato, R. Fergus, and Y. LeCun, "Learning Invariant Features through Topographic Filter Maps," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[111] K. Kavukcuoglu, P. Sermanet, Y.-L. Boureau, K. Gregor, M. Mathieu, and Y. LeCun, "Learning Convolutional Feature Hierarchies for Visual Recognition," Proc. Neural Information and Processing Systems, 2010.
[112] D. Kingma and Y. LeCun, "Regularized Estimation of Image Statistics by Score Matching," Proc. Neural Information and Processing Systems, 2010.
[113] J.J. Kivinen and C.K.I. Williams, "Multiple Texture Boltzmann Machines," Proc. Conf. Artificial Intelligence and Statistics, 2012.
[114] K.P. Körding, C. Kayser, W. Einhäuser, and P. König, "How Are Complex Cell Properties Adapted to the Statistics of Natural Stimuli?" J. Neurophysiology, vol. 91, pp. 206-212, 2004.
[115] A. Krizhevsky, "Convolutional Deep Belief Networks on CIFAR-10," technical report, Univ. of Toronto, 2010.
[116] A. Krizhevsky and G. Hinton, "Learning Multiple Layers of Features from Tiny Images," technical report, Univ. of Toronto, 2009.
[117] A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Proc. Neural Information and Processing Systems, 2012.
[118] H. Larochelle and Y. Bengio, "Classification Using Discriminative Restricted Boltzmann Machines," Proc. Int'l Conf. Machine Learning, 2008.
[119] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, "Exploring Strategies for Training Deep Neural Networks," J. Machine Learning Research, vol. 10, pp. 1-40, 2009.
[120] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[121] H.-S. Le, I. Oparin, A. Allauzen, J.-L. Gauvin, and F. Yvon, "Structured Output Layer Neural Network Language Models for Speech Recognition," IEEE Trans. Audio, Speech, and Language Processing, vol. 21, no. 1, pp. 197-206, Jan. 2013.
[122] Q. Le, J. Ngiam, Z. hao Chen, D.J. Chia, P.W. Koh, and A. Ng, "Tiled Convolutional Neural Networks," Proc. Neural Information and Processing Systems, 2010.
[123] Q. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Ng, "On Optimization Methods for Deep Learning," Proc. Int'l Conf. Machine Learning, 2011.
[124] Q.V. Le, A. Karpenko, J. Ngiam, and A.Y. Ng, "ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning," Proc. Neural Information and Processing Systems, 2011.
[125] Q.V. Le, W.Y. Zou, S.Y. Yeung, and A.Y. Ng, "Learning Hierarchical Spatio-Temporal Features for Action Recognition with Independent Subspace Analysis," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[126] N. Le Roux, Y. Bengio, P. Lamblin, M. Joliveau, and B. Kegl, "Learning the 2-D Topology of Images," Proc. Neural Information and Processing Systems, 2007.
[127] N. Le Roux, P.-A. Manzagol, and Y. Bengio, "Topmoumoute Online Natural Gradient Algorithm," Proc. Neural Information and Processing Systems, 2007.
[128] Y. LeCun, "Learning Processes in an Asymmetric Threshold Network," Disordered Systems and Biological Organization, pp. 233-240, Springer-Verlag, 1986.
[129] Y. LeCun, "Modèles Connexionistes de l'apprentissage," PhD thesis, Univ. de Paris VI, 1987.
[130] Y. LeCun, "Generalization and Network Design Strategies," Connectionism in Perspective, Elsevier, 1989.
[131] Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D. Jackel, "Backpropagation Applied to Handwritten Zip Code Recognition," Neural Computation, vol. 1, pp. 541-551, 1989.
[132] Y. LeCun, L. Bottou, G.B. Orr, and K. Müller, "Efficient Backprop," Neural Networks, Tricks of the Trade, Springer, 1998.
[133] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient Based Learning Applied to Document Recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[134] H. Lee, C. Ekanadham, and A. Ng, "Sparse Deep Belief Net Model for Visual Area V2," Proc. Neural Information and Processing Systems, 2007.
[135] H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng, "Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations," Proc. Int'l Conf. Machine Learning, 2009.
[136] H. Lee, P. Pham, Y. Largman, and A. Ng, "Unsupervised Feature Learning for Audio Classification Using Convolutional Deep Belief Networks," Proc. Neural Information and Processing System, 2009.
[137] Y. Lin, Z. Tong, S. Zhu, and K. Yu, "Deep Coding Network," Proc. Neural Information and Processing Systems, 2010.
[138] D. Lowe, "Object Recognition from Local Scale Invariant Features," Proc. IEEE Int'l Conf. Computer Vision, 1999.
[139] S. Mallat, "Group Invariant Scattering," Comm. Pure and Applied Math., 2012.
[140] B. Marlin and N. de Freitas, "Asymptotic Efficiency of Deterministic Estimators for Discrete Energy-Based Models: Ratio Matching and Pseudolikelihood," Proc. Conf. Uncertainty in Artificial Intelligence, 2011.
[141] B. Marlin, K. Swersky, B. Chen, and N. de Freitas, "Inductive Principles for Restricted Boltzmann Machine Learning," Proc. Conf. Artificial Intelligence and Statistics, pp. 509-516, 2010.
[142] J. Martens, "Deep Learning via Hessian-Free Optimization," Proc. Int'l Conf. Machine Learning, pp. 735-742, 2010.
[143] J. Martens and I. Sutskever, "Learning Recurrent Neural Networks with Hessian-Free Optimization," Proc. Int'l Conf. Machine Learning, 2011.
[144] R. Memisevic and G.E. Hinton, "Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines," Neural Computation, vol. 22, no. 6, pp. 1473-1492, 2010.
[145] G. Mesnil, Y. Dauphin, X. Glorot, S. Rifai, Y. Bengio, I. Goodfellow, E. Lavoie, X. Muller, G. Desjardins, D. Warde-Farley, P. Vincent, A. Courville, and J. Bergstra, "Unsupervised and Transfer Learning Challenge: A Deep Learning Approach," Proc. Unsupervised and Transfer Learning Challenge and Workshop, vol. 7, 2011.
[146] T. Mikolov, A. Deoras, S. Kombrink, L. Burget, and J. Cernocky, "Empirical Evaluation and Combination of Advanced Language Modeling Techniques," Proc. Ann. Conf. Int'l Speech Comm. Assoc., 2011.
[147] H. Mobahi, R. Collobert, and J. Weston, "Deep Learning from Temporal Coherence in Video," Proc. Int'l Conf. Machine Learning, 2009.
[148] A. Mohamed, G. Dahl, and G. Hinton, "Acoustic Modeling Using Deep Belief Networks," IEEE Trans. Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 14-22, Jan. 2012.
[149] G.F. Montufar and J. Morton, "When Does a Mixture of Products Contain a Product of Mixtures?" Technical Report, arXiv:1206.0387, 2012.
[150] I. Murray and R. Salakhutdinov, "Evaluating Probabilities under High-Dimensional Latent Variable Models," Proc. Neural Information and Processing Systems, pp. 1137-1144, 2008.
[151] V. Nair and G.E. Hinton, "Rectified Linear Units Improve Restricted Boltzmann Machines," Proc. Int'l Conf. Machine Learning, 2010.
[152] R.M. Neal, "Connectionist Learning of Belief Networks," Artificial Intelligence, vol. 56, pp. 71-113, 1992.
[153] R.M. Neal, "Probabilistic Inference Using Markov Chain Monte-Carlo Methods," Technical Report CRG-TR-93-1, Dept. of Computer Science, Univ. of Toronto, 1993.
[154] J. Ngiam, Z. Chen, P. Koh, and A. Ng, "Learning Deep Energy Models," Proc. Int'l Conf. Machine Learning, 2011.
[155] B.A. Olshausen and D.J. Field, "Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images," Nature, vol. 381, pp. 607-609, 1996.
[156] Neural Networks: Tricks of the Trade, G. Orr and K.-R. Muller, eds. Springer-Verlag, 1998.
[157] R. Pascanu and Y. Bengio, "Natural Gradient Revisited," Technical Report, arXiv:1301.3584, 2013.
[158] T. Raiko, H. Valpola, and Y. LeCun, "Deep Learning Made Easier by Linear Transformations in Perceptrons," Proc. Conf. Artificial Intelligence and Statistics, 2012.
[159] R. Raina, A. Battle, H. Lee, B. Packer, and A.Y. Ng, "Self-Taught Learning: Transfer Learning from Unlabeled Data," Proc. Int'l Conf. Machine Learning, 2007.
[160] M. Ranzato and G.H. Hinton, "Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2551-2558, 2010.
[161] M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun, "Efficient Learning of Sparse Representations with an Energy-Based Model," Proc. Neural Information and Processing Systems, 2006.
[162] M. Ranzato, Y. Boureau, and Y. LeCun, "Sparse Feature Learning for Deep Belief Networks," Proc. Neural Information and Processing Systems, 2007.
[163] M. Ranzato, A. Krizhevsky, and G. Hinton, "Factored 3-Way Restricted Boltzmann Machines for Modeling Natural Images," Proc. Conf. Artificial Intelligence and Statistics, pp. 621-628, 2010.
[164] M. Ranzato, V. Mnih, and G. Hinton, "Generating More Realistic Images Using Gated MRF's," Proc. Neural Information and Processing Systems, 2010.
[165] M. Ranzato, J. Susskind, V. Mnih, and G. Hinton, "On Deep Generative Models with Applications to Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[166] M. Riesenhuber and T. Poggio, "Hierarchical Models of Object Recognition in Cortex," Nature Neuroscience, vol. 2, pp. 1019-1025, 1999.
[167] S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, "Contractive Auto-Encoders: Explicit Invariance during Feature Extraction," Proc. Int'l Conf. Machine Learning, 2011.
[168] S. Rifai, G. Mesnil, P. Vincent, X. Muller, Y. Bengio, Y. Dauphin, and X. Glorot, "Higher Order Contractive Auto-Encoder," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases, 2011.
[169] S. Rifai, Y. Dauphin, P. Vincent, Y. Bengio, and X. Muller, "The Manifold Tangent Classifier," Proc. Neural Information and Processing Systems, 2011.
[170] S. Rifai, Y. Bengio, Y. Dauphin, and P. Vincent, "A Generative Process for Sampling Contractive Auto-Encoders," Proc. Int'l Conf. Machine Learning, 2012.
[171] S. Roweis, "EM Algorithms for PCA and Sensible PCA," CNS Technical Report CNS-TR-97-02, California Inst. of Tech nology, 1997.
[172] S. Roweis and L.K. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
[173] R. Salakhutdinov, "Learning Deep Boltzmann Machines Using Adaptive MCMC," Proc. Int'l Conf. Machine Learning, 2010.
[174] R. Salakhutdinov, "Learning in Markov Random Fields Using Tempered Transitions," Proc. Neural Information and Processing Systems, 2010.
[175] R. Salakhutdinov and G.E. Hinton, "Semantic Hashing," Proc. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 2007.
[176] R. Salakhutdinov and G.E. Hinton, "Deep Boltzmann Machines," Proc. Conf. Artificial Intelligence and Statistics, pp. 448-455, 2009.
[177] R. Salakhutdinov and H. Larochelle, "Efficient Learning of Deep Boltzmann Machines," Proc. Conf. Artificial Intelligence and Statistics, 2010.
[178] R. Salakhutdinov, A. Mnih, and G.E. Hinton, "Restricted Boltzmann Machines for Collaborative Filtering," Proc. Int'l Conf. Machine Learning, 2007.
[179] F. Savard, "Réseaux de Neurones à Relaxation Entraînés par Critère d'Autoencodeur Débruitant," master's thesis, Univ. of Montréal, 2011.
[180] T. Schmah, G.E. Hinton, R. Zemel, S.L. Small, and S. Strother, "Generative versus Discriminative Training of RBMs for Classification of fMRI Images," Proc. Neural Information and Processing Systems, pp. 1409-1416, 2008.
[181] B. Schölkopf, A. Smola, and K.-R. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," Neural Computation, vol. 10, pp. 1299-1319, 1998.
[182] H. Schwenk, A. Rousseau, and M. Attik, "Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation," Proc. Workshop the Future of Language Modeling for HLT, 2012.
[183] F. Seide, G. Li, and D. Yu, "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks," Proc. Conf. Int'l Speech Comm. Assoc., pp. 437-440, 2011.
[184] F. Seide, G. Li, and D. Yu, "Feature Engineering in Context-Dependent Deep Neural Networks for Conversational Speech Transcription," Proc. IEEE Workshop Automatic Speech Recognition and Understanding, 2011.
[185] T. Serre, L. Wolf, S. Bileschi, and M. Riesenhuber, "Robust Object Recognition with Cortex-Like Mechanisms," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 411-426, Mar. 2007.
[186] S.H. Seung, "Learning Continuous Attractors in Recurrent Networks," Proc. Neural Information and Processing Systems, 1997.
[187] D. Simard, P.Y. Steinkraus, and J.C. Platt, "Best Practices for Convolutional Neural Networks," Proc. Seventh Int'l Conf. Document Analysis and Recognition, 2003.
[188] P. Simard, B. Victorri, Y. LeCun, and J. Denker, "Tangent Prop—A Formalism for Specifying Selected Invariances in an Adaptive Network," Proc. Neural Information and Processing Systems, 1991.
[189] P.Y. Simard, Y. LeCun, and J. Denker, "Efficient Pattern Recognition Using a New Transformation Distance," Proc. Neural Information and Processing Systems, 1992.
[190] P. Smolensky, "Information Processing in Dynamical Systems: Foundations of Harmony Theory," Parallel Distributed Processing, D.E. Rumelhart and J.L. McClelland, eds., vol. 1, chapter 6, pp. 194-281, MIT Press, 1986.
[191] J. Snoek, H. Larochelle, and R.P. Adams, "Practical Bayesian Optimization of Machine Learning Algorithms," Proc. Neural Information and Processing Systems, 2012.
[192] R. Socher, E.H. Huang, J. Pennington, A.Y. Ng, and C.D. Manning, "Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection," Proc. Neural Information and Processing Systems, 2011.
[193] R. Socher, J. Pennington, E.H. Huang, A.Y. Ng, and C.D. Manning, "Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions," Proc. Conf. Empirical Methods in Natural Language Processing, 2011.
[194] N. Srivastava and R. Salakhutdinov, "Multimodal Learning with Deep Boltzmann Machines," Proc. Neural Information and Processing System, 2012.
[195] V. Stoyanov, A. Ropson, and J. Eisner, "Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure," Proc. Conf. Artificial Intelligence and Statistics, 2011.
[196] I. Sutskever, "Training Recurrent Neural Networks," PhD thesis, Dept. of Computer Science, Univ. of Toronto, 2012.
[197] I. Sutskever and T. Tieleman, "On the Convergence Properties of Contrastive Divergence," Proc. Conf. Artificial Intelligence and Statistics, 2010.
[198] I. Sutskever, G. Hinton, and G. Taylor, "The Recurrent Temporal Restricted Boltzmann Machine," Proc. Neural Information and Processing Systems, 2008.
[199] K. Swersky, "Inductive Principles for Learning Restricted Boltzmann Machines," master's thesis, Univ. of British Columbia, 2010.
[200] K. Swersky, M. Ranzato, D. Buchman, B. Marlin, and N. de Freitas, "On Score Matching for Energy Based Models: Generalizing Autoencoders and Simplifying Deep Learning," Proc. Int'l Conf. Machine Learning, 2011.
[201] G. Taylor and G. Hinton, "Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style," Proc. Int'l Conf. Machine Learning, 2009.
[202] G. Taylor, R. Fergus, Y. LeCun, and C. Bregler, "Convolutional Learning of Spatio-Temporal Features," Proc. European Conf. Computer Vision, 2010.
[203] J. Tenenbaum, V. de Silva, and J.C. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, no. 5500, pp. 2319-2323, 2000.
[204] T. Tieleman, "Training Restricted Boltzmann Machines Using Approximations to the Likelihood Gradient," Proc. Int'l Conf. Machine Learning, pp. 1064-1071, 2008.
[205] T. Tieleman and G. Hinton, "Using Fast Weights to Improve Persistent Contrastive Divergence," Proc. Int'l Conf. Machine Learning, 2009.
[206] M.E. Tipping and C.M. Bishop, "Probabilistic Principal Components Analysis," J. Royal Statistical Soc. B, vol. 61, no. 3, pp. 611-622, 1999.
[207] S.C. Turaga, J.F. Murray, V. Jain, F. Roth, M. Helmstaedter, K. Briggman, W. Denk, and H.S. Seung, "Convolutional Networks Can Learn to Generate Affinity Graphs for Image Segmentation," Neural Computation, vol. 22, pp. 511-538, 2010.
[208] L. van der Maaten, "Learning a Parametric Embedding by Preserving Local Structure," Proc. Conf. Artificial Intelligence and Statistics, 2009.
[209] L. van der Maaten and G.E. Hinton, "Visualizing High-Dimensional Data Using t-SNE," J. Machine Learning Research, vol. 9, pp. 2579-2605, 2008.
[210] P. Vincent, "A Connection between Score Matching and Denoising Autoencoders," Neural Computation, vol. 23, no. 7, pp. 1661-1674, 2011.
[211] P. Vincent and Y. Bengio, "Manifold Parzen Windows," Proc. Neural Information and Processing Systems, 2002.
[212] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and Composing Robust Features with Denoising Autoencoders," Proc. Int'l Conf. Machine Learning, 2008.
[213] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, "Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion," J. Machine Learning Research, vol. 11, pp. 3371-3408, 2010.
[214] K.Q. Weinberger and L.K. Saul, "Unsupervised Learning of Image Manifolds by Semidefinite Programming," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 988-995, 2004.
[215] M. Welling, "Herding Dynamic Weights for Partially Observed Random Field Models," Proc. Conf. Uncertainty in Artificial Intelligence, 2009.
[216] M. Welling, G.E. Hinton, and S. Osindero, "Learning Sparse Topographic Representations with Products of Student-t Distributions," Proc. Neural Information and Processing Systems, 2002.
[217] J. Weston, F. Ratle, and R. Collobert, "Deep Learning via Semi-Supervised Embedding," Proc. Int'l Conf. Machine Learning, 2008.
[218] J. Weston, S. Bengio, and N. Usunier, "Large Scale Image Annotation: Learning to Rank with Joint Word-Image Embeddings," Machine Learning, vol. 81, no. 1, pp. 21-35, 2010.
[219] L. Wiskott and T. Sejnowski, "Slow Feature Analysis: Unsupervised Learning of Invariances," Neural Computation, vol. 14, no. 4, pp. 715-770, 2002.
[220] L. Younes, "On the Convergence of Markovian Stochastic Algorithms with Rapidly Decreasing Ergodicity Rates," Stochastics and Stochastic Reports, vol. 65, no. 3, pp. 177-228, 1999.
[221] D. Yu, S. Wang, and L. Deng, "Sequential Labeling Using Deep-Structured Conditional Random Fields," IEEE J. Selected Topics in Signal Processing, vol. 4, no. 6, pp. 965-973, Dec. 2010.
[222] K. Yu and T. Zhang, "Improved Local Coordinate Coding Using Local Tangents," Proc. Int'l Conf. Machine Learning, 2010.
[223] K. Yu, T. Zhang, and Y. Gong, "Nonlinear Learning Using Local Coordinate Coding," Proc. Neural Information and Processing Systems, 2009.
[224] K. Yu, Y. Lin, and J. Lafferty, "Learning Image Representations from the Pixel Level via Hierarchical Sparse Coding," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[225] A.L. Yuille, "The Convergence of Contrastive Divergences," Proc. Neural Information and Processing Systems, pp. 1593-1600, 2004.
[226] M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus, "Deconvolutional Networks," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[227] W.Y. Zou, A.Y. Ng, and K. Yu, "Unsupervised Learning of Visual Invariance with Temporal Coherence," Proc. NIPS Workshop Deep Learning and Unsupervised Feature Learning, 2011.
42 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool