
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Balaji Krishnapuram, Lawrence Carin, Mário A.T. Figueiredo, Alexander J. Hartemink, "Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 957968, June, 2005.  
BibTex  x  
@article{ 10.1109/TPAMI.2005.127, author = {Balaji Krishnapuram and Lawrence Carin and Mário A.T. Figueiredo and Alexander J. Hartemink}, title = {Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {27}, number = {6}, issn = {01628828}, year = {2005}, pages = {957968}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPAMI.2005.127}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Pattern Analysis and Machine Intelligence TI  Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds IS  6 SN  01628828 SP957 EP968 EPD  957968 A1  Balaji Krishnapuram, A1  Lawrence Carin, A1  Mário A.T. Figueiredo, A1  Alexander J. Hartemink, PY  2005 KW  Supervised learning KW  classification KW  sparsity KW  Bayesian inference KW  multinomial logistic regression KW  bound optimization KW  expectation maximization (EM) KW  learning theory KW  generalization bounds. VL  27 JA  IEEE Transactions on Pattern Analysis and Machine Intelligence ER   
[1] P. Bartlett and S. Mendelson, “Rademacher and Gaussian Complexities: Risk Bounds and Structural Results,” J. Machine Learning Research, vol. 3, pp. 463482, 2002.
[2] D. Böhning, “Multinomial Logistic Regression Algorithm,” Annals of the Inst. of Statistical Math., vol. 44, pp. 197200, 1992.
[3] D. Böhning and B. Lindsay, “Monotonicity of QuadraticApproximation Algorithms,” Annals of the Inst. of Statistical Math., vol. 40, pp. 641663, 1988.
[4] S. Chen, D. Donoho, and M. Saunders, “Atomic Decomposition by Basis Pursuit,” SIAM J. Scientific Computation, vol. 20, pp. 3361, 1998.
[5] M. Cristianini and J. ShaweTaylor, An Introduction to Support Vector Machines. Cambridge, U.K.: Cambridge Univ. Press, 2000.
[6] L. Csato and M. Opper, “Sparse Online Gaussian Processes,” Neural Computation, vol. 14, no. 3, pp. 641668, 2002.
[7] J. de Leeuw and G. Michailides, “Block Relaxation Methods in Statistics,” technical report, Dept. of Statistics, Univ. of California at Los Angeles, 1993.
[8] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. B, vol. 39, pp. 138, 1977.
[9] D. Donoho and M. Elad, “Optimally Sparse Representations in General Nonorthogonal Dictionaries by $l_1$ Minimization,” Proc. Nat'l Academy of Science, vol. 100, no. 5, pp. 21972202, 2003.
[10] M. Figueiredo and A. Jain, “Bayesian Learning of Sparse Classifiers,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 3541, 2001.
[11] M. Figueiredo, “Adaptive Sparseness for Supervised Learning,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, pp. 11501159, 2003.
[12] J. Friedman, T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, “Discussion of Boosting Papers,” The Annals of Statistics, vol. 32, no. 1, pp. 102107, 2004.
[13] T. Graepel, R. Herbrich, and J. ShaweTaylor, “Generalisation Error Bounds for Sparse Linear Classifiers,” Proc. Conf. Computational Learning Theory, pp. 298303, 2000.
[14] T. Graepel, R. Herbrich, and R.C. Williamson, “From Margin to Sparsity,” Proc. Neural Information Processing Systems (NIPS) 13, pp. 210216, 2001.
[15] R. Herbrich, Learning Kernel Classifiers: Theory and Algorithms. Cambridge, Mass.: MIT Press, 2002.
[16] B. Krishnapuram, L. Carin, and A. Hartemink, “Joint Classifier and Feature Optimization for Cancer Diagnosis Using Gene Expression Data,” Proc. Int'l Conf. Research in Computational Molecular Biology, 2003.
[17] B. Krishnapuram, A. Hartemink, L. Carin, and M. Figueiredo, “A Bayesian Approach to Joint Feature Selection and Classifier Design,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, pp. 11051111, 2004.
[18] K. Lange, Optimization. New York: Springer Verlag, 2004.
[19] K. Lange, D. Hunter, and I. Yang, “Optimization Transfer Using Surrogate Objective Functions,” J. Computational and Graphical Statistics, vol. 9, pp. 159, 2000.
[20] J. Langford and J. ShaweTaylor, “PACBayes and Margins,” Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun, and K. Obermayer, eds., pp. 423430, Cambridge, Mass.: MIT Press, 2003.
[21] J. Langford, “Practical Prediction Theory for Classification,” Proc. Int'l Conf. Machine Learning, T. Fawcett and N. Mishra, eds., 2003.
[22] N.D. Lawrence, M. Seeger, and R. Herbrich, “Fast Sparse Gaussian Process Methods: The Informative Vector Machine,” Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun and K. Obermayer, eds., pp. 609616, Cambridge, Mass.: MIT Press, 2003.
[23] M. Lewicki and T. Sejnowski, “Learning Overcomplete Representations,” Neural Computation, vol. 12, pp. 337365, 2000.
[24] S. Mallat, A Wavelet Tour of Signal Processing. San Diego, Calif.: Academic Press, 1998.
[25] D. McAllester, “Some PACBayesian Theorems,” Machine Learning, vol. 37, pp. 355363, 1999.
[26] R. Meir and T. Zhang, “Generalization Error Bounds for Bayesian Mixture Algorithms,” J. Machine Learning Research, vol. 4, pp. 839860, 2003.
[27] T. Minka, “A Comparison of Numerical Optimizers for Logistic Regression,” technical report, Dept. of Statistics, Carnegie Mellon Univ., 2003.
[28] R. Neal, Bayesian Learning for Neural Networks. New York: Springer Verlag, 1996.
[29] A.Y. Ng, “Feature Selection, L1 vs. L2 Regularization, and Rotational Invariance,” Proc. Int'l Conf. Machine Learning, 2004.
[30] B. Olshausen and D. Field, “Emergence of SimpleCell Receptive Field Properties by Learning a Sparse Code for Natural Images,” Nature, vol. 381, pp. 607609, 1996.
[31] Y. Qi, T.P. Minka, R.W. Picard, and Z. Ghahramani, “Predictive Automatic Relevance Determination by Expectation Propagation,” Proc. Int'l Conf. Machine Learning, 2004.
[32] R. Salakhutdinov and S. Roweis, “Adaptive Overrelaxed Bound Optimization Methods,” Proc. Int'l Conf. Machine Learning, pp. 664671, 2003.
[33] M. Seeger, “PACBayesian Generalization Error Bounds for Gaussian Process Classification,” J. Machine Learning Research, vol. 3, pp. 233269, 2002.
[34] R. Tibshirani, “Regression Shrinkage and Selection via the LASSO,” J. Royal Statistical Soc. B, vol. 58, no. 1, pp. 267288, 1996.
[35] M. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” J. Machine Learning Research, vol. 1, pp. 211244, 2001.
[36] M. Tipping and A. Faul, “Fast Marginal Likelihood Maximisation for Sparse Bayesian Models,” Proc. Ninth Int'l Workshop Artificial Intelligence and Statistics, C. Bishop and B. Frey, eds., 2003.
[37] L. Valiant, “A Theory of the Learnable,” Comm. ACM, vol. 27, pp. 11341142, 1984.
[38] V. Vapnik, Statistical Learning Theory. New York: John Wiley, 1998.
[39] J. Weston, A. Elisseeff, B. Schölkopf, and M. Tipping, “Use of the ZeroNorm with Linear Models and Kernel Methods,” J. Machine Learning Research, vol. 3, pp. 14391461, 2003.
[40] C. Williams and D. Barber, “Bayesian Classification with Gaussian Priors,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp. 13421351, Dec. 1998.
[41] P. Williams, “Bayesian Regularization and Pruning Using a Laplace Prior,” Neural Computation, vol. 7, pp. 117143, 1995.
[42] T. Zhang and F. Oles, “Regularized Linear Classification Methods,” Information Retrieval, vol. 4, pp. 531, 2001.
[43] J. Zhu and T. Hastie, “Kernel Logistic Regression and the Import Vector Machine,” Advances in Neural Information Processing Systems 14, T. Dietterich, S. Becker, and Z. Ghahramani, eds., pp. 10811088, Cambridge, Mass.: MIT Press, 2002.