Subscribe
Issue No.12 - Dec. (2013 vol.35)
pp: 3025-3036
Xiao-Tong Yuan , Sch. of Inf. & Control, Nanjing Univ. of Inf. Sci. & Technol., Nanjing, China
Shuicheng Yan , Dept. of Electr. & Comput. Eng., Nat. Univ. of Singapore, Singapore, Singapore
ABSTRACT
The forward greedy selection algorithm of Frank and Wolfe has recently been applied with success to coordinate-wise sparse learning problems, characterized by a tradeoff between sparsity and accuracy. In this paper, we generalize this method to the setup of pursuing sparse representations over a prefixed dictionary. Our proposed algorithm iteratively selects an atom from the dictionary and minimizes the objective function over the linear combinations of all the selected atoms. The rate of convergence of this greedy selection procedure is analyzed. Furthermore, we extend the algorithm to the setup of learning nonnegative and convex sparse representation over a dictionary. Applications of the proposed algorithms to sparse precision matrix estimation and low-rank subspace segmentation are investigated with efficiency and effectiveness validated on benchmark datasets.
INDEX TERMS
Greedy algorithms, Sparse matrices, Gaussian processes, Dictionaries,subspace segmentation, Greedy selection, sparse representation, optimization, Gaussian graphical models
CITATION
Xiao-Tong Yuan, Shuicheng Yan, "Forward Basis Selection for Pursuing Sparse Representations over a Dictionary", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 12, pp. 3025-3036, Dec. 2013, doi:10.1109/TPAMI.2013.85
REFERENCES
 [1] T. Cai, W. liu, and X. Luo, "A Constrained $\ell_1$ Minimization Approach to Sparse Precision Matrix Estimation," J. Am. Statistical Assoc., vol. 106, no. 494, pp. 594-607, 2011. [2] K. Clarkson, "Coresets, Sparse Greedy Approximation, and the Frank-Wolfe Algorithm," Proc. ACM-SIAM Symp. Discrete Algorithms, pp. 922-931, 2008. [3] A. d'Aspremont, O. Banerjee, and L. Ghaoui, "First-Order Methods for Sparse Covariance Selection," SIAM J. Matrix Analysis and Its Applications, vol. 30, no. 1, pp. 56-66, 2008. [4] M. Dudik, Z. Harchaoui, and J. Malick, "Lifted Coordinate Descent for Learning with Trace-Norm Regularization," Proc. 15th Int'l Conf. Artificial Intelligence and Statistics, pp. 327-336, 2012. [5] J.C. Dunn and S. Harshbarger, "Conditional Gradient Algorithms with Open Loop Step Size Rules," J. Math. Analysis and Applications, vol. 62, no. 2, pp. 432-C444, 1978. [6] D.M. Edwards, Introduction to Graphical Modelling. Springer, 2000. [7] J. Fan, "Comment on 'Wavelets in Statistics: A Review' by A. Antoniadis," J. Italian Statistical Assoc., vol. 6, pp. 131-138, 1997. [8] J. Fan, Y. Feng, and Y. Wu, "Network Exploration via the Adaptive Lasso and Scad Penalties," Annals of Applied Statistics, vol. 3, no. 2, pp. 521-541, 2009. [9] M. Frank and P. Wolfe, "An Algorithm for Quadratic Programming," Naval Research Logistic Quarterly, vol. 5, pp. 95-110, 1956. [10] J. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine," Annals of Statistics, vol. 29, pp. 1189-1232, 2001. [11] J. Friedman, T. Hastie, and R. Tibshirani, "Sparse Inverse Covariance Estimation with the Graphical Lasso," Biostatistics, vol. 9, no. 3, pp. 432-441, 2008. [12] G. Golub and C. Loan, Matrix Computations. John Hopkins Univ. Press, 1996. [13] A. Grubb and J. Bagnell, "Generalized Boosting Algorithms for Convex Optimization," Proc. 28th Int'l Conf. Machine Learning, pp. 1209-1216, 2011. [14] E. Hazan, "Sparse Approximate Solutions to Semidefinite Programs," Proc. Eighth Latin Am. Conf. Theoretical Informatics, pp. 306-316, 2008. [15] K. Hess, K. Anderson, W. Symmans, V. Valero, N. Ibrahim, J. Mejia, D. Booser, R. Theriault, A. Buzdar, P. Dempsey, R. Rouzier, N. Sneige, J. Ross, T. Vidaurre, H. Gómez, G. Hortobagyi, and L. Pusztai, "Pharmacogenomic Predictor of Sensitivity to Preoperative Chemotherapy with Paclitaxel and Fluorouracil, Doxorubicin, and Cyclophosphamide in Breast Cancer," J. Clinical Oncology, vol. 24, pp. 4236-4244, 2006. [16] J.-B. Hiriart-Urruty and C. Lemarèchal, Convex Analysis and Minimization Algorithms. Springer Verlag, 1993. [17] M. Jaggi, "Sparse Convex Optimization Methods for Machine Learning," PhD thesis, ETH Zurich, 2011. [18] M. Jaggi and M. Sulovský, "A Simple Algorithm for Nuclear Norm Regularized Problem," Proc. 27th Int'l Conf. Machine Learning, pp. 471-478, 2010. [19] R. Johnson and T. Zhang, "Learning Nonlinear Functions Using Regularized Greedy Forest," technical report, 2011. [20] Y. Kim and J. Kim, "Gradient Lasso for Feature Selection," Proc. 21st Int'l Conf. Machine Learning, pp. 60-67, 2004. [21] R. Luss and M. Teboulle, "Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint," technical report, 2011. [22] G. Liu, Z. Lin, and Y. Yu, "Robust Subspace Segmentation by Low-Rank Representation," Proc. 27th Int'l Conf. Machine Learning, pp. 663-670, 2010. [23] Z. Lu, "Smooth Optimization Approach for Sparse Covariance Selection," SIAM J. Optimization, vol. 19, no. 4, pp. 1807-1827, 2009. [24] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, 2004. [25] Y. Ni, J. Sun, X. Yuan, S. Yan, and L. Cheong, "Robust Low-Rank Subspace Segmentation with Semidefinite Guarantees," Proc. Workshop Optimization Based Methods for Emerging Data Mining Problems, 2010. [26] Y. Pati, R. Rezaiifar, and P. Krishnaprasad, "Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition," Proc. 27th Ann. Asilomar Conf. Signals, Systems, and Computers, vol. 1, pp. 40-41, 1993. [27] R. Rockfellar, Convex Analysis. Princeton Press, 1970. [28] A.J. Rothman, P.J. Bickel, E. Levina, and J. Zhu, "Sparse Permutation Invariant Covariance Estimation," Electronic J. Statistics, vol. 2, pp. 494-515, 2008. [29] R. Schapire, "The Boosting Approach to Machine Learning: An Overview," Proc. MSRI Workshop Non-Linear Estimation and Classification, 2002. [30] M. Schmidt, E. Berg, M. Friedlander, and K. Murphy, "Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm," Proc. 12th Int'l Conf. Artificial Intelligence and Statistics, pp. 456-463, 2009. [31] S. Shalev-Shwartz, A. Gonen, and O. Shamir, "Large-Scale Convex Minimization with a Low-Rank Constraint," Proc. 28th Int'l Conf. Machine Learning, 2011. [32] S. Shalev-Shwartz, N. Srebro, and T. Zhang, "Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints," SIAM J. Optimization, vol. 20, pp. 2807-2832, 2010. [33] A. Tewari, P. Ravikumar, and I.S. Dhillon, "Greedy Algorithms for Structurally Constrained High Dimensional Problems," Proc. Advances in Neural Information Processing Systems, 2011. [34] R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," J. Royal Statistical Soc. B., vol. 58, no. 1, pp. 267-288, 1996. [35] J. Tropp and A. Gilbert, "Signal Recovery from Random Measurements via Orthogonal Matching Pursuit," IEEE Trans. Information Theory, vol. 53, no. 12, pp. 4655-4666, 2007. [36] P. Tseng, "On Accelerated Proximal Gradient Methods for Convex-Concave Optimization," submitted to SIAM J. Optimization, 2008. [37] C. Wang, D. Sun, and K.-C. Toh, "Solving Log-Determinant Optimization Problems by a Newton-CG Primal Proximal Point Algorithm," SIAM J. Optimization, vol. 20, pp. 2994-3013, 2010. [38] M. Yuan and Y. Lin, "Model Selection and Estimation in the Gaussian Graphical Model," Biometrika, vol. 94, no. 1, pp. 19-35, 2007. [39] X. Yuan, "Alternating Direction Method of Multipliers for Covariance Selection Models," J. Scientific Computing, vol. 62, no. 2, pp. 432-444, 2012. [40] T. Zhang, "Sequential Greedy Approximation for Certain Convex Optimization Problems," IEEE Trans. Information Theory, vol. 49, no. 3, pp. 682-691, Mar. 2003. [41] T. Zhang, "Sparse Recovery with Orthogonal Matching Pursuit under Rip," IEEE Trans. Information Theory, vol. 57, no. 9, pp. 6215-6221, Sept. 2011.