CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2012 vol.34 Issue No.07 - July

Subscribe

Issue No.07 - July (2012 vol.34)

pp: 1249-1262

Alessandro Perina , Microsoft Research, Redmond

Marco Cristani , University of Verona, Verona, and Italian Institute of Technology, Genova

Umberto Castellani , University of Verona, Verona

Vittorio Murino , University of Verona, Verona, and Italian Institute of Technology, Genova

Nebojsa Jojic , Microsoft Research, Redmond

ABSTRACT

A score function induced by a generative model of the data can provide a feature vector of a fixed dimension for each data sample. Data samples themselves may be of differing lengths (e.g., speech segments or other sequential data), but as a score function is based on the properties of the data generation process, it produces a fixed-length vector in a highly informative space, typically referred to as “score space.” Discriminative classifiers have been shown to achieve higher performances in appropriately chosen score spaces with respect to what is achievable by either the corresponding generative likelihood-based classifiers or the discriminative classifiers using standard feature extractors. In this paper, we present a novel score space that exploits the free energy associated with a generative model. The resulting free energy score space (FESS) takes into account the latent structure of the data at various levels and can be shown to lead to classification performance that at least matches the performance of the free energy classifier based on the same generative model and the same factorization of the posterior. We also show that in several typical computer vision and computational biology applications the classifiers optimized in FESS outperform the corresponding pure generative approaches, as well as a number of previous approaches combining discriminating and generative models.

INDEX TERMS

Hybrid generative/discriminative paradigm, variational free energy, classification.

CITATION

Alessandro Perina, Marco Cristani, Umberto Castellani, Vittorio Murino, Nebojsa Jojic, "Free Energy Score Spaces: Using Generative Information in Discriminative Classifiers",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.34, no. 7, pp. 1249-1262, July 2012, doi:10.1109/TPAMI.2011.241REFERENCES

- [1] A.Y. Ng and M.I. Jordan, "On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes,"
Proc. Advances in Neural Information Processing Systems 14, pp. 841-848, 2001.- [2] G. Bouchard and B. Triggs, "The Trade-Off between Generative and Discriminative Classifiers,"
Proc. 16th IASC Symp. Computational Statistics, pp. 721-728, 2004.- [3] S. Kapadia, "Discriminative Training of Hidden Markov Models," PhD dissertation, Univ. of Cambrdige, 1998.
- [4] J.A. Lasserre, C.M. Bishop, and T.P. Minka, "Principled Hybrids of Generative and Discriminative Models,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 87-94, 2006.- [5] A. Mccallum, C. Pal, G. Druck, and X. Wang, "Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification,"
Proc. 21st Nat'l Conf. Artificial Intelligence, pp. 433-439, 2006.- [6] T. Jaakkola and D. Haussler, "Exploiting Generative Models in Discriminative Classifiers,"
Proc. Advances in Neural Information Processing Systems 11, pp. 487-493, 1998.- [7] N. Smith and M. Gales, "Speech Recognition Using SVMs,"
Proc. Advances in Neural Information Processing Systems 15, pp. 1197-1204, 2002.- [8] K. Tsuda, M. Kawanabe, G. Rätsch, S. Sonnenburg, and K.-R. Müller, "A New Discriminative Kernel from Probabilistic Models,"
Neural Computation, vol. 14, no. 10, pp. 2397-2414, 2002.- [9] T. Jaakkola, M. Meila, and T. Jebara, "Maximum Entropy Discrimination,"
Proc. Advances in Neural Information Processing Systems 12, pp. 470-476, 1999.- [10] O. Yakhenko, L.V. Lita, R. Rosales, and S. Niculescu, "Principled Generative-Discriminative Hybrid Hidden Markov Model,"
Proc. NIPS Workshop Representations and Inference on Probability Distributions, 2007.- [11] A. Fujino, N. Ueda, and K. Saito, "Semi-Supervised Learning for a ybrid Generative/discriminative Classifier Based on the Maximum Entropy Principle,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 424-437, Mar. 2008.- [12] G. Hinton, P. Dayan, B. Frey, and R. Neal, "The Wake-Sleep Algorithm for Unsupervised Neural Networks,"
Science, vol. 268, pp. 1158-1161, 1995.- [13] C. Sminchisescu, A. Kanaujia, and D. Metaxas, "Learning Joint Top-Down and Bottom-Up Processes for 3D Visual Inference,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 1743-1752, 2006.- [14] A. Bosch, A. Zisserman, and M. Xavier, "Scene Classification Using a Hybrid Generative/Discriminative Approach,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 712-727, Apr. 2008.- [15] T. Jebara, R. Kondor, A. Howard, K. Bennett, and N. Cesa-bianchi, "Probability Product Kernels,"
J. Machine Learning Research, vol. 5, pp. 819-844, 2004.- [16] X. Li, T.S. Lee, and Y. Liu, "Hybrid Generative-Discriminative Classification Using Posterior Divergence,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 2713-2720, 2011.- [17] T. Minka, "Discriminative Models, Not Discriminative Training," Technical Report TR-2005-144, Microsoft Research Cambridge, 2005.
- [18] D.-Q. Zhang and S.-F. Chang, "A Generative-Discriminative Hybrid Method for Multi-View Object Detection,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 2017-2024, 2006.- [19] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 264-271, 2003.- [20] A. Epshteyn and G. DeJong, "Generative Prior Knowledge for Discriminative Classification,"
J. Artificial Intelligence Research, vol. 27, no. 1, pp. 25-53, 2006.- [21] C. Weber, S. Wermter, and M. Elshaw, "A Hybrid Generative and Predictive Model of the Motor Cortex,"
Neural Networks, vol. 19, no. 4, pp. 339-353, 2006.- [22] R. Rosales and S. Sclaroff, "Combining Generative and Discriminative Models in a Framework for Articulated Pose Estimation,"
Int'l J. Computer Vision, vol. 67, pp. 251-276, May 2006.- [23] T. Hofmann, "Probabilistic Latent Semantic Indexing,"
Proc. 22nd Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 50-57, 1999.- [24] R. Raina, Y. Shen, A.Y. Ng, and A. McCallum, "Classification with Hybrid Generative/Discriminative Models,"
Proc. Advances in Neural Information Processing Systems 16, pp. 12-19, 2004.- [25] A. Subramanya, Z. Zhang, A. Surendran, P. Nguyen, M. Narasimhan, and A. Acero, "A Generative-Discriminative Framework Using Ensemble Methods for Text-Dependent Speaker Verification,"
Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 225-228, 2007.- [26] M. Bicego, V. Murino, and M. Figueiredo, "Similarity-Based Clustering of Sequences Using Hidden Markov Models,"
Proc. Third Int'l Conf. Machine Learning and Data Mining in Pattern Recognition, P. Perner and A. Rosenfeld, eds., pp. 86-95, 2003.- [27] A.D. Holub, M. Welling, and P. Perona, "Combining Generative Models and Fisher Kernels for Object Class Recognition,"
Proc. IEEE Int'l Conf. Computer Vision, pp. 136-143, 2005.- [28] A. Perina, P. Lovato, M. Cristani, and M. Bicego, "A Comparison on Score Spaces for Expression Microarray Data Classification,"
Proc. Sixth IAPR Int'l Conf. Pattern Recognition in Bioinformatics, pp. 12-28, 2011.- [29] N. Jojic, J. Winn, and L. Zitnick, "Escaping Local Minima through Hierarchical Model Selection: Automatic Object Discovery, Segmentation, and Tracking in Video,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 117-124, 2006.- [30] A. Perina, M. Cristani, U. Castellani, V. Murino, and N. Jojic, "Free Energy Score Space,"
Proc. Advances in Neural Information Processing Systems 22, pp. 1428-1436, 2009.- [31] R.M. Neal and G.E. Hinton, "A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants,"
Learning in Graphical Models, M.I. Jordan, ed., pp. 355-368, MIT Press, 1999.- [32] M.I. Jordan, Z. Ghahramani, T. Jaakkola, and L.K. Saul, "An Introduction to Variational Methods for Graphical Models,"
Machine Learning, vol. 37, no. 2, pp. 183-233, 1999.- [33] H.J. Kappen and W.J. Wiegerinck, "Mean Field Theory for Graphical Models,"
Adavanced Mean Field Theory: Theory and Practice, M. Opper and D. Saad, eds., pp. 37-49, MIT Press, 2001.- [34] Z. Ghahramani, "On Structured Variational Approximations," Technical Report CRG-TR-97-1, Univ. of Cambridge, 1997.
- [35] B. Frey and N. Jojic, "A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 9, pp. 1392-1413, Sept. 2005.- [36] N. Smith and M. Gales, "Using SVMs to Classify Variable Length Speech Patterns," Technical Report CUED/F-INGENF/TR.412, Univ. of Cambridge, 2002.
- [37] L.R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications In Speech Recognition,"
Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.- [38] D. MacKay, "Ensemble Learning for Hidden Markov Models," technical report, Univ. of Cambridge, 1997.
- [39] D. Blei, A. Ng, and M.I. Jordan, "Latent Dirichlet Allocation,"
J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.- [40] G.G. Towell, J.W. Shavlik, and M.O. Noordewier, "Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks,"
Proc. Eighth Nat'l Conf. Artificial Intelligence, pp. 861-866, 1990.- [41] A. Frank and A. Asuncion, "UCI Machine Learning Repository," http://archive.ics.uci.eduml, 2010.
- [42] J.C. Huang, A. Kannan, and J. Winn, "Bayesian Association of Haplotypes and Non-Genetic Factors to Regulatory and Phenotypic Variation in Human Populations,"
Bioinformatics, vol. 23, no. 13, pp. 212-221, 2007.- [43] L. FeiFei and P. Perona, "A Bayesian Hierarchical Model for Learning Natural Scene Categories,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 524-531, 2005.- [44] A. Bosch, A. Zisserman, and X. Munoz, "Scene Classification via Plsa,"
Proc. European Conf. Computer Vision, pp. 517-530, 2006.- [45] M. Girolami and A. Kabán, "On an Equivalence between PLSI and LDA,"
Proc. 26th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 433-434, 2003.- [46] A. Oliva and A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope,"
Int'l J. Computer Vision, vol. 42, no. 3, pp. 145-175, 2001.- [47] J. Vogel and B. Schiele, "Semantic Modeling of Natural Scenes for Content-Based Image Retrieval,"
Int'l J. Computer Vision, vol. 72, no. 2, pp. 133-157, 2007.- [48] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,"
Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 2169-2178, 2006.- [49] A. Perina, M. Cristani, and V. Murino, "Learning Natural Scene Categories by Selective Multi-Scale Feature Extraction,"
Image and Vision Computing, vol. 28, no. 6, pp. 927-939, 2010.- [50] L. Deng and D. O'Shaughnessy,
Speech Processing: A Dynamic and Optimization-Oriented Approach. Marcel Dekker, Inc., 2003. |