This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Task-Driven Dictionary Learning
April 2012 (vol. 34 no. 4)
pp. 791-804
F. Bach, Lab. d'Inf., Ecole Normale Supe'rieure, Paris, France
J. Mairal, Dept. of Stat., Univ. of California, Berkeley, CA, USA
J. Ponce, INRIA-Willow Project-Team, Ecole Normale Super., Paris, France
Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a large-scale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in large-scale settings, and is well suited to supervised and semi-supervised classification, as well as regression tasks for data that admit sparse representations.

[1] S. Mallat, A Wavelet Tour of Signal Processing, second ed. Academic Press, Sept. 1999.
[2] M. Elad and M. Aharon, "Image Denoising via Sparse and Redundant Representations over Learned Dictionaries," IEEE Trans. Image Processing, vol. 54, no. 12, pp. 3736-3745, Dec. 2006.
[3] J. Mairal, M. Elad, and G. Sapiro, "Sparse Representation for Color Image Restoration," IEEE Trans. Image Processing, vol. 17, no. 1, pp. 53-69, Jan. 2008.
[4] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Non-Local Sparse Models for Image Restoration," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[5] R. Grosse, R. Raina, H. Kwong, and A.Y. Ng, "Shift-Invariant Sparse Coding for Audio Classification," Proc. 23rd Conf. Uncertainty in Artificial Intelligence, 2007.
[6] M. Zibulevsky and B.A. Pearlmutter, "Blind Source Separation by Sparse Decomposition in a Signal Dictionary," Neural Computation, vol. 13, no. 4, pp. 863-882, 2001.
[7] R. Raina, A. Battle, H. Lee, B. Packer, and A.Y. Ng, "Self-Taught Learning: Transfer Learning from Unlabeled Data," Proc. Int'l Conf. Machine Learning, 2007.
[8] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Discriminative Learned Dictionaries for Local Image Analysis," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[9] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Supervised Dictionary Learning," Proc. Advances in Neural Information Processing Systems, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds., vol. 21, pp. 1033-1040, 2009.
[10] D.M. Bradley and J.A. Bagnell, "Differentiable Sparse Coding," Proc. Advances in Neural Information Processing Systems, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds., vol. 21, pp. 113-120, 2009.
[11] K. Kavukcuoglu, M. Ranzato, R. Fergus, and Y. LeCun, "Learning Invariant Features through Topographic Filter Maps," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[12] J. Yang, K. Yu, Y. Gong, and T. Huang, "Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[13] S.S. Chen, D.L. Donoho, and M.A. Saunders, "Atomic Decomposition by Basis Pursuit," SIAM J. Scientific Computing, vol. 20, pp. 33-61, 1999.
[14] B.A. Olshausen and D.J. Field, "Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1?" Vision Research, vol. 37, pp. 3311-3325, 1997.
[15] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, "Robust Face Recognition via Sparse Representation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, Feb. 2009.
[16] M. Duarte-Carvajalino and G. Sapiro, "Learning to Sense Sparse Signals: Simultaneous Sensing Matrix and Sparsifying Dictionary Optimization," IEEE Trans. Image Processing, vol. 18, no. 7, pp. 1395-1408, July 2009.
[17] K. Engan, S.O. Aase, and J.H. Husoy, "Frame Based Signal Compression Using Method of Optimal Directions (Mod)," Proc. IEEE Int'l Symp. Circuits and Systems, vol. 4, 1999.
[18] M. Aharon, M. Elad, and A.M. Bruckstein, "The K-SVD: An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representations," IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311-4322, Nov. 2006.
[19] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, "Online Learning for Matrix Factorization and Sparse Coding," J. Machine Learning Research, vol. 11, pp. 19-60, 2010.
[20] D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, Jan. 2003.
[21] A. Argyriou, T. Evgeniou, and M. Pontil, "Convex Multi-Task Feature Learning," Machine Learning, vol. 73, no. 3, pp. 243-272, 2008.
[22] H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng, "Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations," Proc. Int'l Conf. Machine Learning, 2009.
[23] M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun, "Efficient Learning of Sparse Representations with an Energy-Based Model," Proc. Advances in Neural Information Processing Systems, B. Schölkopf, J. Platt and T. Hoffman, eds., vol. 19, pp. 1137-1144, 2007.
[24] M. Ranzato, F. Huang, Y. Boureau, and Y. LeCun, "Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[25] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F.-J. Huang, "A Tutorial on Energy-Based Learning," Predicting Structured Data, G. Bakir, T. Hofman, B. Schölkopf, A. Smola, and B. Taskar, eds., MIT Press, 2006.
[26] H. Larochelle and Y. Bengio, "Classification Using Discriminative Restricted Boltzmann Machines," Proc. Int'l Conf. Machine Learning, 2008.
[27] D. Blei and J. McAuliffe, "Supervised Topic Models," Proc. Advances in Neural Information Processing Systems, J. Platt, D. Koller, Y. Singer, and S. Roweis, eds., vol. 20, pp. 121-128, 2008.
[28] Y. LeCun, L. Bottou, G. Orr, and K. Muller, "Efficient Backprop," Neural Networks: Tricks of the Trade, G. Orr and K.R. Muller, eds., Springer, 1998.
[29] H. Zou and T. Hastie, "Regularization and Variable Selection via the Elastic Net," J. Royal Statistical Soc. Series B, vol. 67, no. 2, pp. 301-320, 2005.
[30] R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," J. Royal Statistical Soc. Series B, vol. 58, no. 1, pp. 267-288, 1996.
[31] L. Bottou and O. Bousquet, "The Trade-Offs of Large Scale Learning," Proc. Advances in Neural Information Processing Systems, J. Platt, D. Koller, Y. Singer, and S. Roweis, eds., vol. 20, pp. 161-168, 2008.
[32] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.
[33] D.D. Lee and H.S. Seung, "Algorithms for Non-Negative Matrix Factorization," Proc. Advances in Neural Information Processing Systems, pp. 556-562, 2001.
[34] I. Daubechies, M. Defrise, and C.D. Mol, "An Iterative Thresholding Algorithm for Linear Inverse Problems with a Sparsity Constraint," Comm. Pure and Applied Math., vol. 57, pp. 1413-1457, 2004.
[35] E. Candes, "Compressive Sampling," Proc. Int'l Congress of Mathematicians, vol. 3, 2006.
[36] D.L. Donoho, "Compressed Sensing," IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289-1306, Apr. 2006.
[37] M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, and R. Baraniuk, "Single-Pixel Imaging via Compressive Sampling," IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 83-91, Mar. 2008.
[38] Y. Weiss, H. Chang, and W. Freeman, "Learning Compressed Sensing," Proc. Snowbird Learning Workshop, 2007.
[39] M.W. Seeger, "Bayesian Inference and Optimal Design for the Sparse Linear Model," J. Machine Learning Research, vol. 9, pp. 759-813, 2008.
[40] H.J. Kushner and G. Yin, Stochastic Approximation and Recursive Algorithms and Applications. Springer, 2003.
[41] L. Bottou, "Online Algorithms and Stochastic Approximations," Online Learning and Neural Networks, D. Saad, ed., Cambridge Univ. Press, 1998.
[42] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least Angle Regression," Annals of Statistics, vol. 32, no. 2, pp. 407-499, 2004.
[43] N. Murata, "Statistical Study on On-Line Learning," On-Line Learning in Neural Networks, pp. 63-92, Cambridge Univ. Press, 1999.
[44] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[45] Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D. Jackel, "Handwritten Digit Recognition with a Back-Propagation Network," Proc. Advances in Neural Information Processing Systems, D. Touretzky, ed., vol. 2, Morgan Kaufman, 1990.
[46] B. Haasdonk and D. Keysers, "Tangent Distance Kernels for Support Vector Machines," Proc. Int'l Conf. Pattern Recognition, 2002.
[47] R.W. Floyd and L. Steinberg, "An Adaptive Algorithm for Spatial Grey Scale," Proc. Soc. of Information Display, vol. 17, pp. 75-77, 1976.
[48] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, "Inverse Halftoning by Pointwise Shape-Adaptive DCT Regularized Deconvolution," Proc. Int'l TICSP Workshop Spectral Methods Multirate Signal Processing, 2006.
[49] A. Foi, V. Katkovnik, K. Egiazarian, and J. Astola, "Inverse Halftoning Based on the Anisotropic lpa-Ici Deconvolution," Proc. Int'l TICSP Workshop Spectral Methods and Multirate Signal Processing, 2004.
[50] T.D. Kite, N. Damera-Venkata, B.L. Evans, and A.C. Bovik, "A Fast, High-Quality Inverse Halftoning Algorithm for Error Diffused Halftones," IEEE Trans. Image Processing, vol. 9, no. 9, pp. 1583-1592, Sept. 2000.
[51] R. Neelamani, R. Nowak, and R. Baraniuk, "WInHD: Wavelet-Based Inverse Halftoning via Deconvolution," Rejecta Mathematica, vol. 1, no. 1, pp. 84-103, 2009.
[52] S. Lyu, D.N. Rockmore, and H. Farid, "A Digital Technique for Art Authentication," Proc. Nat'l Academy of Sciences USA, vol. 101, no. 49, pp. 17006-17010, 2004.
[53] C.R. Johnson, E.H. Hendriks, I.J. Berezhnoy, E. Brevdo, S.M. Hugues, I. Daubechies, J. Li, E. Postma, and J.Z. Want, "Image Processing for Artist Identification," IEEE Signal Processing Magazine, vol. 25, no. 4, pp. 37-48, July 2008.
[54] J.M. Hugues, D.J. Graham, and D.N. Rockmore, "Quantification of Artistic Style through Sparse Coding Analysis in the Drawings of Pieter Bruegel the Elder," Proc. Nat'l Academy of Sciences USA, vol. 107, no. 4, pp. 1279-1283, 2009.
[55] Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce, "Learning Mid-level Features for Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[56] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2006 (VOC2006) Results," 2006.
[57] J.J. Fuchs, "Recovery of Exact Sparse Representations in the Presence of Bounded Noise," IEEE Trans. Information Theory, vol. 51, no. 10, pp. 3601-3608, Oct. 2005.
[58] J.M. Borwein and A.S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer, 2006.

Index Terms:
regression analysis,compressed sensing,data models,handwritten character recognition,image classification,image representation,image restoration,learning (artificial intelligence),matrix decomposition,regression tasks,task-driven dictionary learning,data modeling,linear combinations,learned dictionary,machine learning,neuroscience,signal processing,natural images,sparse representations,restoration tasks,large-scale matrix factorization problem,classical optimization tools,image classification,supervised dictionary learning,handwritten digit classification,digital art identification,nonlinear inverse image problems,compressed sensing,semisupervised classification,Dictionaries,Sparse matrices,Vectors,Sensors,Cost function,Machine learning,compressed sensing.,Basis pursuit,Lasso,dictionary learning,matrix factorization,semi-supervised learning
Citation:
F. Bach, J. Mairal, J. Ponce, "Task-Driven Dictionary Learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 791-804, April 2012, doi:10.1109/TPAMI.2011.156
Usage of this product signifies your acceptance of the Terms of Use.