This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Learning Flexible Features for Conditional Random Fields
August 2008 (vol. 30 no. 8)
pp. 1415-1426
Extending traditional models for discriminative labeling of structured data to include higher-order structure in the labels results in an undesirable exponential increase in model complexity. In this paper, we present a model that is capable of learning such structures using a random field of parameterized features. These features can be functions of arbitrary combinations of observations, labels and auxiliary hidden variables. We also present a simple induction scheme to learn these features, which can automatically determine the complexity needed for a given data set. We apply the model to two real-world tasks, information extraction and image labeling, and compare our results to several other methods for discriminative labeling.

[1] A. McCallum, D. Freitag, and F. Pereira, “Maximum Entropy Markov Models for Information Extraction and Segmentation,” Proc. 17th Int'l Conf. Machine Learning (ICML '00), 2000.
[2] J. Lafferty, A. McCallum, and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proc. 18th Int'l Conf. Machine Learning (ICML '01), 2001.
[3] S. Kumar and M. Hebert, “Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification,” Proc. Ninth IEEE Int'l Conf. Computer Vision (ICCV '03), 2003.
[4] D. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge Univ. Press, 2003.
[5] R.H. Swendsen and J.S. Wang, “Nonuniversal Critical Dynamics in Monte Carlo Simulations,” Physical Rev. Letters, vol. 58, no. 2, pp. 86-88, 1987.
[6] G.E. Hinton, “Training Products of Experts by Minimizing Contrastive Divergence,” Neural Computation, vol. 14, pp. 1771-1800, 2002.
[7] S.D. Pietra, V.J.D. Pietra, and J.D. Lafferty, “Inducing Features of Random Fields,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380-393, Nov. 1997.
[8] M. Welling, R.S. Zemel, and G.E. Hinton, “Self Supervised Boosting,” Advances in Neural Information Processing Systems 15, 2003.
[9] S. Zhu, Y. Wu, and D. Mumford, “Minimax Entropy Principle and Its Application to Texture Modeling,” Neural Computation, vol. 9, no. 8, pp. 1627-1660, 1997.
[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[11] Y. Altun, I. Tsochantaridis, and T. Hofmann, “Hidden Markov Support Vector Machines,” Proc. 20th Int'l Conf. Machine Learning (ICML '03), 2003.
[12] J. Lafferty, X. Zhu, and Y. Liu, “Kernel Conditional Random Fields: Representation and Clique Selection,” Proc. 21st Int'l Conf. Machine Learning (ICML '04), 2004.
[13] B. Taskar, C. Guestrin, and D. Koller, “Max-Margin Markov Networks,” Proc. Neural Information Processing Systems Conf. (NIPS '03), 2003.
[14] M. Welling, M. Rosen-Zvi, and G. Hinton, “Exponential Family Harmoniums with an Application to Information Retrieval,” Advances in Neural Information Processing Systems 17, 2005.
[15] S. Roth and M.J. Black, “Fields of Experts: A Framework for Learning Image Priors,” Proc. IEEE Computer Vision and Pattern Recognition (CVPR '05), 2005.
[16] C. Sutton and A. McCallum, “Collective Segmentation and Labeling of Distant Entities in Information Extraction,” Proc. ICML Workshop Statistical Relational Learning, 2004.
[17] S. Sarawagi and W.W. Cohen, “Semi-Markov Conditional Random Fields for Information Extraction,” Advances in Neural Information Processing Systems 17, 2005.
[18] Y. Bengio and P. Frasconi, “An Input Output HMM Architecture,” Advances in Neural Information Processing Systems 7, 1995.
[19] S. Kakade, Y.W. Teh, and S. Roweis, “An Alternate Objective Function for Markovian Fields,” Proc. 19th Int'l Conf. Machine Learning (ICML '02), 2002.
[20] A. McCallum, “Efficiently Inducing Features of Conditional Random Fields,” Proc. 19th Conf. Uncertainty in Artificial Intelligence (UAI '03), 2003.
[21] F. Peng and A. McCallum, “Accurate Information Extraction from Research Papers Using Conditional Random Fields,” Proc. Human Language Technology Conf. North Am. Chapter of the Assoc. for Computational Linguistics (HLT-NAACL '04), 2004.
[22] X. He, R.S. Zemel, and M.A. Carreira-Perpiñán, “Multiscale Conditional Random Fields for Image Labeling,” Proc. IEEE Computer Vision and Pattern Recognition (CVPR '04), 2004.
[23] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation,” Proc. Ninth European Conf. Computer Vision (ECCV '06), 2006.

Index Terms:
machine learning, statistical models, induction, text analysis, pixel classification, markov random fields
Citation:
Liam Stewart, Xuming He, Richard S. Zemel, "Learning Flexible Features for Conditional Random Fields," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 8, pp. 1415-1426, Aug. 2008, doi:10.1109/TPAMI.2007.70790
Usage of this product signifies your acceptance of the Terms of Use.