Subscribe
Issue No.06 - June (2013 vol.35)
pp: 1523-1534
Sotirios P. Chatzis , Cyprus University of Technology, Limassol
Yiannis Demiris , Imperial College London, London
ABSTRACT
Sequential data labeling is a fundamental task in machine learning applications, with speech and natural language processing, activity recognition in video sequences, and biomedical data analysis being characteristic examples, to name just a few. The conditional random field (CRF), a log-linear model representing the conditional distribution of the observation labels, is one of the most successful approaches for sequential data labeling and classification, and has lately received significant attention in machine learning as it achieves superb prediction performance in a variety of scenarios. Nevertheless, existing CRF formulations can capture only one- or few-timestep interactions and neglect higher order dependences, which are potentially useful in many real-life sequential data modeling applications. To resolve these issues, in this paper we introduce a novel CRF formulation, based on the postulation of an energy function which entails infinitely long time-dependences between the modeled data. Building blocks of our novel approach are: 1) the sequence memoizer (SM), a recently proposed nonparametric Bayesian approach for modeling label sequences with infinitely long time dependences, and 2) a mean-field-like approximation of the model marginal likelihood, which allows for the derivation of computationally efficient inference algorithms for our model. The efficacy of the so-obtained infinite-order CRF ($({\rm CRF}^{\infty })$) model is experimentally demonstrated.
INDEX TERMS
Computational modeling, Data models, Context, Hidden Markov models, Inference algorithms, Context modeling, Approximation methods, mean-field principle, Conditional random field, sequential data, sequence memoizer
CITATION
Sotirios P. Chatzis, Yiannis Demiris, "The Infinite-Order Conditional Random Field Model for Sequential Data Modeling", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 6, pp. 1523-1534, June 2013, doi:10.1109/TPAMI.2012.208
REFERENCES
 [1] The CMU MoCap Database, http:/mocap.cs.cmu.edu/, 2012. [2] D.P. Bertsekas, Nonlinear Programming, second ed. Athena Scientific, 1999. [3] Y. Boykov, O. Veksler, and R. Zabih, "Fast Approximate Energy Minimization via Graph Cuts," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222-1239, Nov. 2001. [4] G. Celeux, F. Forbes, and N. Peyrard, "EM Procedures Using Mean Field-Like Approximations for Markov Model-Based Image Segmentation," Pattern Recognition, vol. 36, no. 1, pp. 131-144, 2003. [5] D. Chandler, Introduction to Modern Statistical Mechanics. Oxford Univ. Press, 1987. [6] S.P. Chatzis and T.A. Varvarigou, "A Fuzzy Clustering Approach toward Hidden Markov Random Field Models for Enhanced Spatially Constrained Image Segmentation," IEEE Trans. Fuzzy Systems, vol. 16, no. 5, pp. 1351-1361, Oct. 2008. [7] S.P. Chatzis, D.I. Kosmopoulos, and T.A. Varvarigou, "Robust Sequential Data Modeling Using an Outlier Tolerant Hidden Markov Model," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 9, pp. 1657-1669, Sept. 2009. [8] S.P. Chatzis and G. Tsechpenakis, "The Infinite Hidden Markov Random Field Model," IEEE Trans. Neural Networks, vol. 21, no. 6, pp. 1004-1014, June 2010. [9] P.F. Felzenszwalb and D.P. Huttenlocher, "Efficient Belief Propagation for Early Vision," Int'l J. Computer Vision, vol. 70, no. 1, pp. 41-54, 2006. [10] T. Ferguson, "A Bayesian Analysis of Some Nonparametric Problems," The Ann. of Statistics, vol. 1, pp. 209-230, 1973. [11] J. Gasthaus and Y.W. Teh, "Improvements to the Sequence Memoizer," Proc. Conf. Neural Information Processing Systems, 2011. [12] D. Geiger and F. Girosi, "Parallel and Deterministic Algorithms from MRFs: Surface Reconstruction," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 5, pp. 401-412, May 1991. [13] X. He, R.S. Zemel, and M.A. Carreira-Perpinan, "Multiscale Conditional Random Fields for Image Labelling," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 692-705, 2004. [14] T. Hofmann and J. Buhmann, "Pairwise Data Clustering by Deterministic Annealing," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 1, pp. 1-14, Jan. 1997. [15] T. Jaakkola and M. Jordan, "Improving the Mean Field Approximation via the Use of Mixture Distributions," Learning in Graphical Models, M. Jordan, ed., pp. 163-173, Kluwer Academic Publishers, 1998. [16] P. Kohli, M. Kumar, and P. Torr, "P3 & Beyond: Solving Energies with Higher Order Cliques," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007. [17] P. Kohli, L. Ladicky, and P. Torr, "Robust Higher Order Potentials for Enforcing Label Consistency," Int'l J. Computer Vision, vol. 82, pp. 302-324, 2009. [18] V. Kolmogorov and R. Zabih, "What Energy Functions Can Be Minimized via Graph Cuts?" IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 147-159, Feb. 2004. [19] N. Komodakis and N. Paragios, "Beyond Pairwise Energies: Efficient Optimization for Higher-Order MRFs," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2985-2992, 2009. [20] S. Kumar and M. Hebert, "Discriminative Random Fields," Int'l J. Computer Vision, vol. 68, no. 2, pp. 179-201, 2006. [21] J. Lafferty, A. McCallum, and F. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proc. 18th Int'l Conf. Machine Learning, 2001. [22] X. Lan, S. Roth, D. Huttenlocher, and M.J. Black, "Efficient Belief Propagation with Learned Higher-Order Markov Random Fields," Proc. Ninth European Conf. Computer Vision, vol. 2, pp. 269-282, 2006. [23] N. Lawrence, "Gaussian Process Software," http://staffwww. dcs.shef.ac.uk/people/n.lawrence software.html, 2012. [24] D. Liu and J. Nocedal, "On the Limited Memory Method for Large Scale Optimization," Math. Programming B, vol. 45, no. 3, pp. 503-528, 1989. [25] M. Marcus, B. Santorini, and M. Marcinkiewicz, "Building a Large Annotated Corpus of English: The Penn Treebank," Corpus Linguistics: Readings in a Widening Discipline, G. Sampson and D. McCarthy, eds., Continuum, 2004. [26] A. McCallum and W. Li, "Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons," Proc. Seventh Conf. Natural Language Learning, pp. 188-191, 2003. [27] R. McDonald and F. Pereira, "Identifying Gene and Protein Mentions in Text Using Conditional Random Fields," BMC Bioinformatics, vol. 6, no. Suppl 1, p. S6, 2005. [28] G. McLachlan and D. Peel, Finite Mixture Models, series in probability and statistics. Wiley, 2000. [29] F. Peng, F. Feng, and A. Mccallum, "Chinese Segmentation and New Word Detection Using Conditional Random Fields," Science, p. 562, 2004. [30] J. Pitman and M. Yor, "The Two-Parameter Poisson-Dirichlet Distribution Derived from a Stable Subordinator," Ann. Probability, vol. 25, pp. 855-900, 1997. [31] B. Potetz, "Efficient Belief Propagation for Vision Using Linear Constraint Nodes," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007. [32] L. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. IEEE, vol. 77, no. 2, pp. 245-255, Feb. 1989. [33] C. Rother, P. Kohli, W. Feng, and J. Jia, "Minimizing Sparse Higher Order Energy Functions of Discrete Variables," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1382-1389, 2009. [34] P. Sen and L. Getoor, "Cost-Sensitive Learning with Conditional Markov Networks," Data Mining and Knowledge Discovery, vol. 17, no. 2, pp. 136-163, 2008. [35] F. Sha and F. Pereira, "Shallow Parsing with Conditional Random Fields," Proc. Conf. North Am. Chapter Assoc. for Computational Linguistics on Human Language, vol. 1, pp. 134-141, 2003. [36] C. Sutton and A. McCallum, "An Introduction to Conditional Random Fields for Relational Learning," Introduction to Statistical Relational Learning, L. Getoor and B. Taskar, eds., MIT Press, 2006. [37] B. Taskar, C. Guestrin, and D. Koller, "Max-Margin Markov Networks," Advances in Neural Information Processing Systems, vol. 16, 2004. [38] Y.W. Teh, "A Hierarchical Bayesian Language Model Based on Pitman-Yor Processes," Proc. 21st Int'l Conf. Computational Linguistics and 44th Ann. Meeting Assoc. for Computational Linguistics, pp. 985-992, 2006. [39] F. Wood, C. Archambeau, J. Gasthaus, L.F. James, and Y. Teh, "A Stochastic Memoizer for Sequence Data," Proc. Int'l Conf. Machine Learning, 2009. [40] F. Wood, J. Gasthaus, C. Archambeau, L. James, and Y.W. Teh, "The Sequence Memoizer," Comm. ACM, vol. 54, no. 2, pp. 91-98, 2011. [41] N. Ye, W.S. Lee, H.L. Chieu, and D. Wu, "Conditional Random Fields with High-Order Features for Sequence Labeling," Proc. Advances in Neural Information Processing Systems Conf., 2009. [42] A. Yuille, "Generalized Deformable Models, Statistical Physics and Matching Problems," Neural Computing, vol. 2, pp. 1-24, 1990. [43] J. Zerubia and R. Chellappa, "Mean Field Approximation Using Compound Gauss-Markov Random Field for Edge Detection and Image Restoration," Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 2193-2196, 1990. [44] J. Zhang, "The Mean Field Theory in EM Procedures for Markov Random Fields," IEEE Trans. Image Processing, vol. 2, no. 1, pp. 27-40, Oct. 1993. [45] J. Zhang and S. Gong, "Action Categorization with Modified Hidden Conditional Random Field," Pattern Recognition, vol. 43, no. 1, pp. 197-203, 2010.