This Article 
 Bibliographic References 
 Add to: 
Selection of Generative Models in Classification
April 2006 (vol. 28 no. 4)
pp. 544-554
This paper is concerned with the selection of a generative model for supervised classification. Classical criteria for model selection assess the fit of a model rather than its ability to produce a low classification error rate. A new criterion, the Bayesian Entropy Criterion (BEC), is proposed. This criterion takes into account the decisional purpose of a model by minimizing the integrated classification entropy. It provides an interesting alternative to the cross-validated error rate which is computationally expensive. The asymptotic behavior of the BEC criterion is presented. Numerical experiments on both simulated and real data sets show that BEC performs better than the BIC criterion to select a model minimizing the classification error rate and provides analogous performance to the cross-validated error rate.

[1] S. Agarwal and D. Roth, “Learning a Sparse Representation For Object Detection,” Proc. Seventh European Conf. Computer Vision, pp. 113-128, 2002.
[2] H. Akaike, “A New Look at Statistical Model Identification,” IEEE Trans. Automatic Control, vol. 19, pp. 716-723, 1974.
[3] H. Bensmail and G. Celeux, “Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition,” J. Am. Statistical Assoc., vol. 91, pp. 1743-48, 1996.
[4] J.M. Bernardo and A.F.M. Smith, Bayesian Theory, first ed. John Wiley and Sons, 1994.
[5] C. Biernacki, G. Celeux, and G. Govaert, “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 7, pp. 719-725, July 2000.
[6] G. Bouchard and G. Celeux, “Supervised Classification with Spherical Gaussian Mixtures,” Proc. Classification and Data Analysis Group of the Italian Statistical Soc. (CLADAG), pp. 75-78, 2003.
[7] G. Bouchard and B. Triggs, “Hierarchical Part-Based Visual Object Categorization,” Proc. Int'l Conf. Computer Vision and Pattern Recognition, June 2005.
[8] G. Csurka, C. Bray, C. Dance, and L. Fan, “Visual Categorization with Bags of Keypoints,” Proc. Eighth European Conf. Computer Vision, pp. 59-74, 2004.
[9] G. Dorko and C. Schmid, “Selection of Scale-Invariant Parts for Object Class Recognition,” Proc. Ninth Int'l Conf. Computer Vision, pp. 634-640, 2003.
[10] R. Fergus, P. Perona, and A. Zisserman, “Object Class Recognition by Unsupervised Scale-Invariant Learning,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 264-271, June 2003.
[11] C. Fraley and A.E. Raftery, “Model-Based Clustering, Discriminant Analysis, and Density Estimation,” J. Am. Statistical Association, vol. 97, pp. 611-631, 2002.
[12] J. Friedman, “Regularized Discriminant Analysis,” J. Am. Statistical Assoc., vol. 84, pp. 165-175, 1989.
[13] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian Network Classifiers,” Machine Learning, vol. 29, pp. 131-163, 1997.
[14] S. Geisser and W.F. Eddy, “A Predictive Approach to Model Selection,” J. Am. Statistical Assoc., vol. 74, pp. 153-160, 1974.
[15] G.L. Goodman and D.W. McMichael, “Objective Functions for Maximum Likelihood Classifier Design,” Proc. Conf. Information Decision and Control '99, pp. 585-589, Feb. 1999.
[16] R. Greiner and W. Zhou, “Structural Extension to Logistic Regression: Discriminant Parameter Learning of Belief Net Classifiers,” Proc. 18th Ann. Nat'l Conf. Artificial Intelligence, pp. 167-173, 2002.
[17] D. Grossman and P. Domingos, “Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood,” Proc. 21st Int'l Conf. Machine Learning, 2004.
[18] T. Hastie and R. Tibshirani, “Discriminant Analysis by Gaussian Mixtures,” J. Royal Statistical Soc. Series B, vol. 58, pp. 158-176, 1996.
[19] J.A. Hoeting, D.D. Madigan, A.E. Raftery, and C.T. Volinsky, “Bayesian Model Averaging: A Tutorial (with Discussion),” Statistical Science, vol. 14, pp. 382-417, 1999.
[20] T. Jebara, “Discriminative, Generative and Imitative Learning,” PhD thesis, Media Laboratory, MIT, 2001.
[21] R. Kass and A. Raftery, “Bayes Factors,” J. Am. Statistical Assoc., vol. 90, pp. 773-795, 1995.
[22] P. Kontkanen, P. Myllymäki, and H. Tirri, “Classifier Learning with Supervised Marginal Likelihood,” Proc. 17th Int'l Conf. Uncertainty in Artificial Intelligence, pp. 277-284, 2001.
[23] J. Lafferty, A. McCallum, and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proc. 18th Int'l Conf. Machine Learning, pp. 96-105, 2001.
[24] P.W. Laud and J.G. Ibrahim, “Predictive Model Selection,” J. Royal Statistical Soc., vol. 57, pp. 247-262, 1995.
[25] D.G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoint,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[26] G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. Wiley, 1992.
[27] G.J. McLachlan and D. Peel, Finite Mixture Models. Wiley, 2000.
[28] N. Murata, S. Yoshizawa, and S.-I. Amari, “Network Information Criterion— Determining the Number of Hidden Units for an Artificial Neural Network Model,” IEEE Trans. Neural Networks, vol. 5, pp. 865-872, 1994.
[29] A.Y. Ng and M.I. Jordan, “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes,” Advances in Neural Information Processing Systems, vol. 14, pp. 609-616, 2002.
[30] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer, “Weak Hypotheses and Boosting for Generic Object Detection and Recognition,” Proc. Eighth European Conf. Computer Vision, vol. 2, pp. 71-84, 2004.
[31] A.E. Raftery, “Bayesian Model Selection in Social Research (with Discussion),” Sociological Methodology, pp. 111-196, 1995.
[32] R.A. Redner and H.F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,” SIAM Rev., vol. 26, pp. 195-239, 1984.
[33] B.D. Ripley, Pattern Recognition and Neural Networks. Cambridge Univ. Press, 1996.
[34] K. Roeder and L. Wasserman, “Practical Bayesian Density Estimation Using Mixtures of Normals,” J. Am. Statistical Assoc., vol. 92, pp. 894-902, 1997.
[35] B. Schülkopf and A.J. Smola, Learning with Kernels. MIT Press, 2002.
[36] G. Schwartz, “Estimating the Dimension of a Model,” The Annals of Statistics, vol. 6, pp. 461-464, 1978.
[37] S. Yanazaki and S. Watanabe, “Singularities in Mixture Models and Upper Bounds of Stochastic Complexity,” Int'l J. Neural Networks, vol. 16, pp. 1029-1038, 2003.

Index Terms:
Generative classification, integrated likelihood, integrated conditional likelihood, classification entropy, cross-validated error rate, AIC and BIC criteria.
Guillaume Bouchard, Gilles Celeux, "Selection of Generative Models in Classification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 544-554, April 2006, doi:10.1109/TPAMI.2006.82
Usage of this product signifies your acceptance of the Terms of Use.