The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - Oct. (2013 vol.35)
pp: 2454-2467
J. Domke , NICTA, Australia Nat. Univ., Canberra, ACT, Australia
ABSTRACT
Likelihood-based learning of graphical models faces challenges of computational complexity and robustness to model misspecification. This paper studies methods that fit parameters directly to maximize a measure of the accuracy of predicted marginals, taking into account both model and inference approximations at training time. Experiments on imaging problems suggest marginalization-based learning performs better than likelihood-based approximations on difficult problems where the model being fit is approximate in nature.
INDEX TERMS
Vectors, Entropy, Approximation algorithms, Optimization, Function approximation, Markov processes,segmentation, Graphical models, conditional random fields, machine learning, inference
CITATION
J. Domke, "Learning Graphical Model Parameters with Approximate Marginal Inference", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 10, pp. 2454-2467, Oct. 2013, doi:10.1109/TPAMI.2013.31
REFERENCES
[1] J. Besag, "Spatial Interaction and the Statistical Analysis of Lattice Systems," J. Royal Statistical Soc., Series B (Methodological), vol. 36, no. 2, pp. 192-236, 1974.
[2] J. Lafferty, A. McCallum, and F. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proc. Int'l Conf. Machine Learning, 2001.
[3] M. Nikolova, "Model Distortions in Bayesian MAP Reconstruction," Inverse Problems and Imaging, vol. 1, no. 2, pp. 399-422, 2007.
[4] J. Marroquin, S. Mitter, and T. Poggio, "Probabilistic Solution of Ill-Posed Problems in Computational Vision," J. Am. Statistical Assoc., vol. 82, no. 397, pp. 76-89, 1987.
[5] S.S. Gross, O. Russakovsky, C.B. Do, and S. Batzoglou, "Training Conditional Random Fields for Maximum Labelwise Accuracy," Proc. Advances in Neural Information Processing Systems, 2007.
[6] S. Kumar, J. August, and M. Hebert, "Exploiting Inference for Approximate Parameter Learning in Discriminative Fields: An Empirical Study," Proc. Int'l Conf. Energy Minimization Methods in Computer Vision and Pattern Recognition, 2005.
[7] P. Kohli and P. Torr, "Measuring Uncertainty in Graph Cut Solutions," Computer Vision and Image Understanding, vol. 112, no. 1, pp. 30-38, 2008.
[8] M. Wainwright and M. Jordan, "Graphical Models, Exponential Families, and Variational Inference," Foundations and Trends in Machine Learning, vol. 1, nos. 1/2, pp. 1-305, 2008.
[9] T. Meltzer, A. Globerson, and Y. Weiss, "Convergent Message Passing Algorithms—A Unifying View," Proc. Conf. Uncertainty in Artificial Intelligence, 2009.
[10] S. Nowozin and C.H. Lampert, "Structured Learning and Prediction in Computer Vision," Foundations and Trends in Computer Graphics and Vision, vol. 6, pp. 185-365, 2011.
[11] H. Cramér, Mathematical Methods of Statistics. Princeton Univ. Press, 1999.
[12] S. Nowozin, P.V. Gehler, and C.H. Lampert, "On Parameter Learning in CRF-Based Approaches to Object Class Image Segmentation," Proc. European Conf. Computer Vision, 2010.
[13] L. Stewart, X. He, and R.S. Zemel, "Learning Flexible Features for Conditional Random Fields," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 8, pp. 1415-1426, Aug. 2008.
[14] C. Geyer, "Markov Chain Monte Carlo Maximum Likelihood," Proc. Symp. Interface, 1991.
[15] M. Carreira-Perpinan and G. Hinton, "On Contrastive Divergence Learning," Proc. Int'l Workshop Artificial Intelligence and Statistics, 2005.
[16] S. Roth and M.J. Black, "Fields of Experts," Int'l J. Computer Vision, vol. 82, no. 2, pp. 205-229, 2009.
[17] M.J. Wainwright, "Estimating the 'Wrong' Graphical Model: Benefits in the Computation-Limited Setting," J. Machine Learning Research, vol. 7, pp. 1829-1859, 2006.
[18] J.J. Weinman, L.C. Tran, and C.J. Pal, "Efficiently Learning Random Fields for Stereo Vision with Sparse Message Passing," Proc. European Conf. Computer Vision, pp. 617-630, 2008.
[19] T. Toyoda and O. Hasegawa, "Random Field Model for Integration of Local Information and Global Information," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 8, pp. 1483-1489, Aug. 2008.
[20] A. Levin and Y. Weiss, "Learning to Combine Bottom-Up and Top-Down Segmentation," Int'l J. Computer Vision, vol. 81, no. 1, pp. 105-118, 2009.
[21] S. Kumar, J. August, and M. Hebert, "Exploiting Inference for Approximate Parameter Learning in Discriminative Fields: An Empirical Study," Proc. Int'l Conf. Energy Minimization Methods in Computer Vision and Pattern Recognition, 2005.
[22] X. Ren, C. Fowlkes, and J. Malik, "Figure/Ground Assignment in Natural Images," Proc. European Conf. Computer Vision, 2006.
[23] S.V.N. Vishwanathan, N.N. Schraudolph, M.W. Schmidt, and K.P. Murphy, "Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods," Proc. Int'l Conf. Machine Learning, 2006.
[24] X. Ren, C. Fowlkes, and J. Malik, "Learning Probabilistic Models for Contour Completion in Natural Images," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 47-63, 2008.
[25] J. Yuan, J. Li, and B. Zhang, "Scene Understanding with Discriminative Structured Prediction," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[26] J.J. Verbeek and B. Triggs, "Scene Segmentation with CRFs Learned from Partially Labeled Images," Proc. Advances in Neural Information Processing Systems, 2007.
[27] D. Scharstein and C. Pal, "Learning Conditional Random Fields for Stereo," Proc. IEEE Conf. Computer Vision and Pattern Recognition , 2007.
[28] P. Zhong and R. Wang, "Using Combination of Statistical Models and Multilevel Structural Information for Detecting Urban Areas from a Single Gray-Level Image," IEEE Trans. Geoscience and Remote Sensing, vol. 45, no. 5, pp. 1469-1482, May 2007.
[29] J. Besag, "Statistical Analysis of Non-Lattice Data," J. Royal Statistical Soc. Series D (The Statistician), vol. 24, no. 3, pp. 179-195, 1975.
[30] X. He, R.S. Zemel, and M.Á. Carreira-Perpiñán, "Multiscale Conditional Random Fields for Image Labeling," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[31] S. Kumar and M. Hebert, "Discriminative Random Fields," Int'l J. Computer Vision, vol. 68, no. 2, pp. 179-201, 2006.
[32] S.C. Zhu and X. Liu, "Learning in Gibbsian Fields: How Accurate and How Fast Can It Be?" IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 1001-1006, June 2002.
[33] C. Sutton and A. McCallum, "Piecewise Training for Undirected Models," Proc. Conf. Uncertainty Artificial Intelligence, 2005.
[34] S. Kim and I.-S. Kweon, "Robust Model-Based Scene Interpretation by Multilayered Context Information," Computer Vision and Image Understanding, vol. 105, no. 3, pp. 167-187, 2007.
[35] J. Shotton, J.M. Winn, C. Rother, and A. Criminisi, "Textonboost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context," Int'l J. Computer Vision, vol. 81, no. 1, pp. 2-23, 2009.
[36] V. Stoyanov, A. Ropson, and J. Eisner, "Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure," Proc. Int'l Workshop Artificial Intelligence and Statistics, 2011.
[37] J. Domke, "Learning Convex Inference of Marginals," Proc. Conf. Uncertainty Artificial Intelligence, 2008.
[38] L.R. Bahl, P.F. Bron, P.V. de Souza, and R.L. Mercer, "A New Algorithm for the Estimation of Hidden Markov Model Parameters," Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, 1988.
[39] S. Kakade, Y.W. Teh, and S. Roweis, "An Alternate Objective Function for Markovian Fields," Proc. Int'l Conf. Machine Learning, 2002.
[40] B.G. Lindsay, "Composite Likelihood Methods," Contemporary Math., vol. 80, pp. 221-239, 1988.
[41] J. Domke, "Learning Convex Inference of Marginals," Proc. Conf. Uncertainty Artificial Intelligence, 2008.
[42] C. Desai, D. Ramanan, and C.C. Fowlkes, "Discriminative Models for Multi-Class Object Layout," Int'l J. Computer Vision, vol. 95, no. 1, pp. 1-12, 2011.
[43] M. Szummer, P. Kohli, and D. Hoiem, "Learning CRFs Using Graph Cuts," Proc. European Conf. Computer Vision, 2008.
[44] J.J. McAuley, T.E. de Campos, G. Csurka, and F. Perronnin, "Hierarchical Image-Region Labeling via Structured Learning," Proc. British Machine Vision Conf., 2009.
[45] W. Yang, B. Triggs, D. Dai, and G.-S. Xia, "Scene Segmentation via Low-Dimensional Semantic Representation and Conditional Random Field," technical report, HAL, 2009.
[46] J. Domke, "Implicit Differentiation by Perturbation," Proc. Advances in Neural Information Processing Systems, 2010.
[47] A. Boresi and K. Chong, Approximate Solution Methods in Eng. Mechanics. Elsevier Science Inc., 1991.
[48] N. Andrei, "Accelerated Conjugate Gradient Algorithm with Finite Difference Hessian/Vector Product Approximation for Unconstrained Optimization," J. Computational Applied Math., vol. 230, no. 2, pp. 570-582, 2009.
[49] J. Nocedal and S.J. Wright, Numerical Optimization, second ed. Springer, 2006.
[50] M. Welling and Y.W. Teh, "Linear Response Algorithms for Approximate Inference in Graphical Models," Neural Computation, vol. 16, pp. 197-221, 2004.
[51] J. Domke, "Parameter Learning with Truncated Message-Passing," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[52] V. Stoyanov and J. Eisner, "Minimum-Risk Training of Approximate CRF-Based NLP Systems," Proc. Conf. North Am. Chapter of the Assoc. for Computational Linguistics: Human Language Technologies, 2012.
[53] F. Eaton and Z. Ghahramani, "Choosing a Variable to Clamp," Proc. Int'l Conf. Artificial Intelligence and Statistics, 2009.
[54] G. Konidaris, S. Osentoski, and P. Thomas, "Value Function Approximation in Reinforcement Learning Using the Fourier Basis," Proc. Conf. Artificial Intelligence, 2011.
[55] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
74 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool