This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Bayesian Gaussian Process Classification with the EM-EP Algorithm
December 2006 (vol. 28 no. 12)
pp. 1948-1959
Gaussian process classifiers (GPCs) are Bayesian probabilistic kernel classifiers. In GPCs, the probability of belonging to a certain class at an input location is monotonically related to the value of some latent function at that location. Starting from a Gaussian process prior over this latent function, data are used to infer both the posterior over the latent function and the values of hyperparameters to determine various aspects of the function. Recently, the expectation propagation (EP) approach has been proposed to infer the posterior over the latent function. Based on this work, we present an approximate EM algorithm, the EM-EP algorithm, to learn both the latent function and the hyperparameters. This algorithm is found to converge in practice and provides an efficient Bayesian framework for learning hyperparameters of the kernel. A multiclass extension of the EM-EP algorithm for GPCs is also derived. In the experimental results, the EM-EP algorithms are as good or better than other methods for GPCs or Support Vector Machines (SVMs) with cross-validation.

[1] V. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
[2] R. Herbrich, T. Graepel, and C. Campbell, “Bayes Point Machines,” J. Machine Learning Research, vol. 1, pp. 245-279, 2001.
[3] C.K.I. Williams and D. Barber, “Bayesian Classification with Gaussian Processes,” IEEE Trans. Pattern Anlysis and Machine Intelligence, vol. 20, no. 12, pp. 1342-1351, Dec. 1998.
[4] M. Gibbs and D.J.C. MacKay, “Variational Gaussian Process Classifiers,” IEEE Trans. Neural Networks, vol. 11, no. 6, p. 1458, Nov. 2000.
[5] R. Neal, “Regression and Classification Using Gaussian Process Priors,” Bayesian Statistics 6, pp. 475-501, 1997.
[6] A. O'Hagan, “On Curve Fitting and Optimal Design for Regression,” J. Royal Statistical Soc., vol. 40, pp. 1-42, 1978.
[7] C.K.I. Williams and C.E. Rasmussen, “Gaussian Processes for Regression,” Proc. Neural Information Processing Systems Conf. (NIPS-8), 1995.
[8] M. Gibbs and D.J. C. MacKay, “Efficient Implementation of Gaussian Processes,” draft manuscript (http://citeseer.nj.nec. com6489.html) 1997.
[9] R. Neal, “Bayesian Learning for Neural Networks,” Lecture Notes in Statistics, no. 118, 1996.
[10] C.E. Rasmussen, “Evaluation of Gaussian Processes and other Methods for Non-Linear Regression,” PhD Thesis, Univ. of Toronto, 1996.
[11] M. Opper and O. Winther, “Gaussian Processes for Classification: Mean Field Algorithms,” Neural Computation, vol. 12, no. 11, pp.2655-2684, Nov 2000.
[12] T. Minka, “A Family of Algorithms for Approximate Bayesian Inference,” PhD thesis, MIT, Jan. 2001, http://research.microsoft. com/~minka/papers ep/.
[13] M. Kuss and C. Rasmussen, “Assessing Approximate Inference for Binary Gaussian Process Classification,” J. Machine Learning Research, vol. 6, pp. 1679-1704, 2005.
[14] L. Csato, E. Fokoue, M. Opper, B. Schottky, and O. Winther, “Efficient Approaches to Gaussian Process Classification,” Proc. Neural Information Processing Systems Conf. (NIPS), vol. 13, 2000.
[15] B. Krishnapuram, A. Hartemink, L. Carin, and M. Figueiredo, “A Bayesian Approach to Joint Feature Selection and Classifier Design,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1105-1111, Sept. 2004.
[16] H.-C. Kim and Z. Ghahramani, “The EM-EP Algorithm for Gaussian Process Classification,” Proc. Workshop Probabilistic Graphical Models for Classification (ECML), 2003.
[17] M. Seeger, “Notes on Minka's Expectation Propagation for Gaussian Process Classification,” technical report, 2002.
[18] L. Csato and M. Opper, “Sparse Representation for Gaussian Process Models,” Proc. Neural Information Processing Systems Conf. (NIPS), vol. 13, 2000.
[19] L. Csato, M. Opper, and O. Winther, “TAP Gibbs Free Energy, Belief Propagation and Sparsity,” Proc. Neural Information Processing Systems Conf. (NIPS), vol. 14, 2001.
[20] M. Seeger, N. Lawrence, and R. Herbrich, “Sparse Representation for Gaussian Process Models,” Proc. Neural Information Processing Systems Conf. (NIPS), vol. 15, 2002.
[21] T. Minka and J. Lafferty, “Expectation-Propagation for the Generative Aspect Model,” Proc. 18th Conf. Uncertainty in Artificial Intelligence (UAI), pp. 352-359, 2002.
[22] Y. Qi and T. Minka, “Expectation Propagation for Signal Detection in Flat-Fading Channels,” technical report, MIT, 2003.
[23] T. Minka and Y. Qi, “Tree-Structured Approximations by Expectation Propagation,” Proc. Neural Information Processing Systems Conf. (NIPS), vol. 16, 2003.
[24] T. Heskes and O. Zoeter, “Expectation Propagation for Approximate Inference in Dynamic Bayesian Networks,” Proc. 16th Conf. Uncertainty in Artificial Intelligence (UAI), pp. 216-223, 2002.
[25] H.-C. Kim, “ Bayesian and Ensemble Kernel Classifiers,” PhD thesis, POSTECH, Jan. 2005, http://home.postech.ac.kr/~grasspublication /.
[26] G.H. Golub and C.F.V. Loan, Matrix Computation. Johns Hopkins Press, 1996.
[27] M.N. Gibbs, “Bayesian Gaussian Processes for Regression and Classification,” PhD thesis, Univ. of Cambridge, 1997.
[28] M. Seeger and M.I. Jordan, “Sparse Gaussian Process Classification with Multiple Classes,” Technical Report TR 661, Dept. of Statistics, Univ. of California at Berkeley, 2004.
[29] A. Genz, “Numerical Computation of Multivariate Normal Probabilities,” J. Computer Graph Statistics, vol. 1, pp. 141-149, 1992.
[30] W. Chu and Z. Ghahramani, “Gaussian Processes for Ordinal Regression,” J. Machine Learning Research, vol. 6, pp. 1019-1041, 2005.
[31] J. Platt, N. Cristianini, and J. Shawe-Taylor, “Large Margin DAGs for Multiclass Classification,” Proc. Neural Information Processing Systems Conf. (NIPS), pp. 547-553, vol. 12, 2000.
[32] E. Snelson and Z. Ghahramani, “Sparse Parametric Gaussian Processes,” Proc. Neural Information Processing Systems Conf. (NIPS), vol. 18, 2005.
[33] M. Seeger, “Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers,” Proc. Neural Information Processing Systems Conf. (NIPS), vol. 12, pp. 603-609, 2000.
[34] J. Kwok, “Moderating the Outputs of Support Vector Machine Classifiers,” IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1018-1031, 1999.
[35] J. Kwok, “The Evidence Framework Applied to Support Vector Machines,” IEEE Trans. Neural Networks, vol. 11, no. 5, pp. 1162-1173, 2000.
[36] P. Sollich, “Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities,” Machine Learning, vol. 46, pp. 21-52, 2002.
[37] W. Chu, “Bayesian Approach to Support Vector Machines,” PhD thesis, Nat'l Univ. of Singapore, Jan. 2003.

Index Terms:
Gaussian process classification, Bayesian methods, kernel methods, expectation propagation, EM-EP algorithm.
Citation:
Hyun-Chul Kim, Zoubin Ghahramani, "Bayesian Gaussian Process Classification with the EM-EP Algorithm," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1948-1959, Dec. 2006, doi:10.1109/TPAMI.2006.238
Usage of this product signifies your acceptance of the Terms of Use.