This Article 
 Bibliographic References 
 Add to: 
Learning Flexible Features for Conditional Random Fields
August 2008 (vol. 30 no. 8)
pp. 1415-1426
Extending traditional models for discriminative labeling of structured data to include higher-order structure in the labels results in an undesirable exponential increase in model complexity. In this paper, we present a model that is capable of learning such structures using a random field of parameterized features. These features can be functions of arbitrary combinations of observations, labels and auxiliary hidden variables. We also present a simple induction scheme to learn these features, which can automatically determine the complexity needed for a given data set. We apply the model to two real-world tasks, information extraction and image labeling, and compare our results to several other methods for discriminative labeling.

[1] A. McCallum, D. Freitag, and F. Pereira, “Maximum Entropy Markov Models for Information Extraction and Segmentation,” Proc. 17th Int'l Conf. Machine Learning (ICML '00), 2000.
[2] J. Lafferty, A. McCallum, and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proc. 18th Int'l Conf. Machine Learning (ICML '01), 2001.
[3] S. Kumar and M. Hebert, “Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification,” Proc. Ninth IEEE Int'l Conf. Computer Vision (ICCV '03), 2003.
[4] D. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge Univ. Press, 2003.
[5] R.H. Swendsen and J.S. Wang, “Nonuniversal Critical Dynamics in Monte Carlo Simulations,” Physical Rev. Letters, vol. 58, no. 2, pp. 86-88, 1987.
[6] G.E. Hinton, “Training Products of Experts by Minimizing Contrastive Divergence,” Neural Computation, vol. 14, pp. 1771-1800, 2002.
[7] S.D. Pietra, V.J.D. Pietra, and J.D. Lafferty, “Inducing Features of Random Fields,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380-393, Nov. 1997.
[8] M. Welling, R.S. Zemel, and G.E. Hinton, “Self Supervised Boosting,” Advances in Neural Information Processing Systems 15, 2003.
[9] S. Zhu, Y. Wu, and D. Mumford, “Minimax Entropy Principle and Its Application to Texture Modeling,” Neural Computation, vol. 9, no. 8, pp. 1627-1660, 1997.
[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[11] Y. Altun, I. Tsochantaridis, and T. Hofmann, “Hidden Markov Support Vector Machines,” Proc. 20th Int'l Conf. Machine Learning (ICML '03), 2003.
[12] J. Lafferty, X. Zhu, and Y. Liu, “Kernel Conditional Random Fields: Representation and Clique Selection,” Proc. 21st Int'l Conf. Machine Learning (ICML '04), 2004.
[13] B. Taskar, C. Guestrin, and D. Koller, “Max-Margin Markov Networks,” Proc. Neural Information Processing Systems Conf. (NIPS '03), 2003.
[14] M. Welling, M. Rosen-Zvi, and G. Hinton, “Exponential Family Harmoniums with an Application to Information Retrieval,” Advances in Neural Information Processing Systems 17, 2005.
[15] S. Roth and M.J. Black, “Fields of Experts: A Framework for Learning Image Priors,” Proc. IEEE Computer Vision and Pattern Recognition (CVPR '05), 2005.
[16] C. Sutton and A. McCallum, “Collective Segmentation and Labeling of Distant Entities in Information Extraction,” Proc. ICML Workshop Statistical Relational Learning, 2004.
[17] S. Sarawagi and W.W. Cohen, “Semi-Markov Conditional Random Fields for Information Extraction,” Advances in Neural Information Processing Systems 17, 2005.
[18] Y. Bengio and P. Frasconi, “An Input Output HMM Architecture,” Advances in Neural Information Processing Systems 7, 1995.
[19] S. Kakade, Y.W. Teh, and S. Roweis, “An Alternate Objective Function for Markovian Fields,” Proc. 19th Int'l Conf. Machine Learning (ICML '02), 2002.
[20] A. McCallum, “Efficiently Inducing Features of Conditional Random Fields,” Proc. 19th Conf. Uncertainty in Artificial Intelligence (UAI '03), 2003.
[21] F. Peng and A. McCallum, “Accurate Information Extraction from Research Papers Using Conditional Random Fields,” Proc. Human Language Technology Conf. North Am. Chapter of the Assoc. for Computational Linguistics (HLT-NAACL '04), 2004.
[22] X. He, R.S. Zemel, and M.A. Carreira-Perpiñán, “Multiscale Conditional Random Fields for Image Labeling,” Proc. IEEE Computer Vision and Pattern Recognition (CVPR '04), 2004.
[23] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation,” Proc. Ninth European Conf. Computer Vision (ECCV '06), 2006.

Index Terms:
machine learning, statistical models, induction, text analysis, pixel classification, markov random fields
Liam Stewart, Xuming He, Richard S. Zemel, "Learning Flexible Features for Conditional Random Fields," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 8, pp. 1415-1426, Aug. 2008, doi:10.1109/TPAMI.2007.70790
Usage of this product signifies your acceptance of the Terms of Use.