The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - Sept. (2013 vol.35)
pp: 2104-2116
Lei Yuan , Dept. of Comput. Sci. & Eng., Arizona State Univ., Tempe, AZ, USA
Jun Liu , Siemens Corp. Res., Princeton, NJ, USA
Jieping Ye , Dept. of Comput. Sci. & Eng., Arizona State Univ., Tempe, AZ, USA
ABSTRACT
The group Lasso is an extension of the Lasso for feature selection on (predefined) nonoverlapping groups of features. The nonoverlapping group structure limits its applicability in practice. There have been several recent attempts to study a more general formulation where groups of features are given, potentially with overlaps between the groups. The resulting optimization is, however, much more challenging to solve due to the group overlaps. In this paper, we consider the efficient optimization of the overlapping group Lasso penalized problem. We reveal several key properties of the proximal operator associated with the overlapping group Lasso, and compute the proximal operator by solving the smooth and convex dual problem, which allows the use of the gradient descent type of algorithms for the optimization. Our methods and theoretical results are then generalized to tackle the general overlapping group Lasso formulation based on the eq norm. We further extend our algorithm to solve a nonconvex overlapping group Lasso formulation based on the capped norm regularization, which reduces the estimation bias introduced by the convex penalty. We have performed empirical evaluations using both a synthetic and the breast cancer gene expression dataset, which consists of 8,141 genes organized into (overlapping) gene sets. Experimental results show that the proposed algorithm is more efficient than existing state-of-the-art algorithms. Results also demonstrate the effectiveness of the nonconvex formulation for overlapping group Lasso.
INDEX TERMS
Optimization, Convergence, Indexes, Algorithm design and analysis, Acceleration, Silicon, Convex functions,difference of convex programming, Sparse learning, overlapping group Lasso, proximal operator
CITATION
Lei Yuan, Jun Liu, Jieping Ye, "Efficient Methods for Overlapping Group Lasso", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 9, pp. 2104-2116, Sept. 2013, doi:10.1109/TPAMI.2013.17
REFERENCES
[1] R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," J. Royal Statistical Soc. Series B, vol. 58, no. 1, pp. 267-288, 1996.
[2] M. Yuan and Y. Lin, "Model Selection and Estimation in Regression with Grouped Variables," J. Royal Statistical Soc. Series B, vol. 68, no. 1, pp. 49-67, 2006.
[3] H. Liu, M. Palatucci, and J. Zhang, "Blockwise Coordinate Descent Procedures for the Multi-Task Lasso, with Applications to Neural Semantic Basis Discovery," Proc. 26th Ann. Int'l Conf. Machine Learning, 2009.
[4] J. Liu, S. Ji, and J. Ye, "Multi-Task Feature Learning via Efficient $\ell_{2,1}$ -Norm Minimization," Proc. 25h Conf. Uncertainty in Artificial Intelligence, 2009.
[5] L. Meier, S. Geer, and P. Bühlmann, "The Group Lasso for Logistic Regression," J. Royal Statistical Soc.: Series B, vol. 70, pp. 53-71, 2008.
[6] L. Jacob, G. Obozinski, and J. Vert, "Group Lasso with Overlap and Graph Lasso," Proc. 26th Ann. Int'l Conf. Machine Learning, 2009.
[7] H.D. Bondell and B.J. Reich, "Simultaneous Regression Shrinkage, Variable Selection and Clustering of Predictors with Oscar," Biometrics, vol. 64, pp. 115-123, 2008.
[8] S. Kim and E.P. Xing, "Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity," Proc. Int'l Conf. Machine Learning, 2010.
[9] J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, "Network Flow Algorithms for Structured Sparsity," Proc. Advances in Neural Information Processing Systems, pp. 1558-1566, 2010.
[10] P. Zhao, G. Rocha, and B. Yu, "The Composite Absolute Penalties Family for Grouped and Hierarchical Variable Selection," Annals of Statistics, vol. 37, no. 6A, pp. 3468-3497, 2009.
[11] S. Mosci, S. Villa, A. Verri, and L. Rosasco, "A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups," Proc. Advances in Neural Information Processing Systems, 2010.
[12] Z. Qin and D. Goldfarb, "Structured Sparsity via Alternating Direction Methods," Arxiv preprint arXiv:1105.0728, 2011.
[13] R. Jenatton, J.-Y. Audibert, and F. Bach, "Structured Variable Selection with Sparsity-Inducing Norms," arXiv:0904.3523, 2009.
[14] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Now Publishers. Inc., 2010.
[15] A. Argyriou, C. Micchelli, M. Pontil, L. Shen, and Y. Xu, "Efficient First Order Methods for Linear Composite Regularizers," Arxiv preprint arXiv:1104.1436, 2011.
[16] X. Chen, Q. Lin, S. Kim, J. Carbonell, and E. Xing, "Smoothing Proximal Gradient Method for General Structured Sparse Learning," Annals of Applied Statistics, vol. 6, no. 2, pp. 719-752, 2012.
[17] A. Beck and M. Teboulle, "A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems," SIAM J. Imaging Sciences, vol. 2, no. 1, pp. 183-202, 2009.
[18] A. Nemirovski, Efficient Methods in Convex Programming, 1994.
[19] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 2004.
[20] J.-J. Moreau, "Proximité et Dualité dans un Espace Hilbertien," Bull. Soc. Math. France, vol. 93, pp. 273-299, 1965.
[21] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, "Proximal Methods for Sparse Hierarchical Dictionary Learning," Proc. Int'l Conf. Machine Learning, 2010.
[22] J. Liu and J. Ye, SLEP: Sparse Learning with Efficient Projections, Arizona State Univ., http://www.public.asu.edu/jye02/ Software SLEP, 2009.
[23] J. Friedman, T. Hastie, H. Höfling, and R. Tibshirani, "Pathwise Coordinate Optimization," Annals of Applied Statistics, vol. 1, no. 2, pp. 302-332, 2007.
[24] J.F. Bonnans and A. Shapiro, "Optimization Problems with Perturbations: A Guided Tour," SIAM Rev., vol. 40, no. 2, pp. 228-264, 1998.
[25] J.M. Danskin, The Theory of Max-Min and Its Applications to Weapons Allocation Problems. Springer-Verlag, 1967.
[26] Y. Ying, C. Campbell, and M. Girolami, "Analysis of SVM with Indefinite Kernels," Proc. Advances in Neural Information Processing Systems, pp. 2205-2213, 2009.
[27] P. Combettes and J. Pesquet, "Proximal Splitting Methods in Signal Processing," Arxiv preprint arXiv:0912.3522, 2009.
[28] J. Liu and J. Ye, "Efficient l1/lq Norm Regularization," Arxiv preprint arXiv:1009.4766, 2010.
[29] T. Zhang, "Multi-Stage Convex Relaxation for Feature Selection," Arxiv preprint arXiv:1106.0565, 2011.
[30] X. Shen, W. Pan, and Y. Zhu, "Likelihood-Based Selection and Sharp Parameter Estimation," J. Am. Statistical Assoc., vol. 107, no. 497, pp. 223-232, 2012.
[31] T. Zhang, "Analysis of Multi-Stage Convex Relaxation for Sparse Regularization," J. Machine Learning Research, vol. 11, pp. 1081-1107, 2010.
[32] M.J. Van de Vijver et al., "A Gene-Expression Signature as a Predictor of Survival in Breast Cancer," The New England J. Medicine, vol. 347, no. 25, pp. 1999-2009, 2002.
[33] A. Subramanian et al., "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles," Proc. Nat'l Academy of Sciences USA, vol. 102, no. 43, pp. 15545-15550, 2005.
[34] H.Y. Chuang, E. Lee, Y.T. Liu, D. Lee, and T. Ideker, "Network-Based Classification of Breast Cancer Metastasis," Molecular Systems Biology, vol. 3, no. 140, 2007.
[35] R. Rockafellar, "Monotone Operators and the Proximal Point Algorithm," SIAM J. Control and Optimization, vol. 14, pp. 877-898, 1976.
[36] B. He and X. Yuan, "An Accelerated Inexact Proximal Point Algorithm for Convex Minimization," J. Optimization Theory and Applications, vol. 154, p. 536, 2010.
[37] M. Schmidt, N.L. Roux, and F. Bach, "Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization," Proc. Advances in Neural Information Processing Systems, 2011.
[38] J. Liu, P. Musialski, P. Wonka, and J. Ye, "Tensor Completion for Estimating Missing Values in Visual Data," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 208-220, Jan. 2013.
69 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool