This Article 
 Bibliographic References 
 Add to: 
Latent Log-Linear Models for Handwritten Digit Classification
June 2012 (vol. 34 no. 6)
pp. 1105-1117
T. Gass, Comput. Vision Lab., ETH Zurich, Zurich, Switzerland
T. Deselaers, Google Switzerland, Zurich, Switzerland
G. Heigold, Google Inc., Mountain View, CA, USA
H. Ney, Lehrstuhl fur Inf. 6, RWTH Aachen, Aachen, Germany
We present latent log-linear models, an extension of log-linear models incorporating latent variables, and we propose two applications thereof: log-linear mixture models and image deformation-aware log-linear models. The resulting models are fully discriminative, can be trained efficiently, and the model complexity can be controlled. Log-linear mixture models offer additional flexibility within the log-linear modeling framework. Unlike previous approaches, the image deformation-aware model directly considers image deformations and allows for a discriminative training of the deformation parameters. Both are trained using alternating optimization. For certain variants, convergence to a stationary point is guaranteed and, in practice, even variants without this guarantee converge and find models that perform well. We tune the methods on the USPS data set and evaluate on the MNIST data set, demonstrating the generalization capabilities of our proposed models. Our models, although using significantly fewer parameters, are able to obtain competitive results with models proposed in the literature.

[1] J. Anderson, "Logistic Discrimination," Handbook of Statistics 2, P.R. Krishnaiah and L.N. Kanal, eds., pp. 169-191, North-Holland, 1982.
[2] O. Barndorff-Nielsen and P. Jupp, "Approximating Exponential Models," Annals Inst. of Statistical Math., vol. 41, no. 2, pp. 247-267, 1988.
[3] O. Bender, F. Och, and H. Ney, "Maximum Entropy Models for Named Entity Recognition," Proc. Seventh Conf. Computational Natural Language Learning HLT-NAACL, pp. 148-152, May 2003.
[4] J.C. Bezdek and R.J. Hathaway, "Convergence of Alternating Optimization," Neural Parallel Scientific Computation, vol. 11, no. 4, pp. 351-368, 2003.
[5] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, "Choosing Multiple Parameters for Support Vector Machines," Machine Learning, vol. 46, no. 1, pp. 131-159, 2002.
[6] J.N. Darroch and D. Ratcliff, "Generalized Iterative Scaling for Log-Linear Models," The Annals of Math. Statistics, vol. 43, no. 5, pp. 1470-1480, Oct. 1972.
[7] D. DeCoste and B. Schölkopf, "Training Invariant Support Vector Machines," Machine Learning, vol. 46, nos. 1-3, pp. 161-190, 2002.
[8] P.F. Felzenszwalb, R.B. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[9] T. Gass, T. Deselaers, and H. Ney, "Deformation-Aware Log-Linear Models," Proc. 31st DAGM Symp. Pattern Recognition, Sept. 2009.
[10] P.V. Gehler and S. Nowozin, "Let the Kernel Figure It Out: Principled Learning of Pre-Processing for Kernel Classifiers," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[11] A. Gunawardana and W. Byrne, "Convergence Theorems for Generalized Alternating Minimization Procedures," J. Machine Learning Research, vol. 6, pp. 2049-2073, 2005.
[12] A. Gunawardana, M. Mahajan, A. Acero, and J.C. Platt, "Hidden Conditional Random Fields for Phone Classification," Proc. Int'l Conf. Spoken Language Processing, pp. 117-120, Sept. 2005.
[13] B. Haasdonk, "Transformation Knowledge in Pattern Analysis with Kernel Methods," PhD thesis, Albert-Ludwigs-Universität Freiburg, 2005.
[14] B. Haasdonk, "Feature Space Interpretation of SVMs with Indefinite Kernels," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 4, pp. 482-492, Apr. 2005.
[15] B. Haasdonk and D. Keysers, "Tangent Distance Kernels for Support Vector Machines," Proc. 16th Int'l Conf. Pattern Recognition, pp. 864-868, Sept. 2002.
[16] G. Heigold, P. Lehnen, R. Schlueter, and H. Ney, "On the Equivalence of Gaussian and Log-Linear HMMs," Proc. Interspeech, Sept. 2008.
[17] G. Hinton, S. Osindero, and Y.-W. Teh, "A Fast Learning Algorithm for Deep Belief Nets," Neural Computation, vol. 18, pp. 1527-1554, 2006.
[18] T. Jebara, Machine Learning: Discriminative and Generative. Kluwer, 2003.
[19] D. Keysers, W. Macherey, H. Ney, and J. Dahmen, "Adaptation in Statistical Pattern Recognition Using Tangent Vectors," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 269-274, Feb. 2004.
[20] D. Keysers, T. Deselaers, C. Gollan, and H. Ney, "Deformation Models for Image Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 8, pp. 1422-1435, Aug. 2007.
[21] S. Kullback, "Estimating and Testing Interaction Parameters in the Log-Linear Model," unpublished manuscript, 1971.
[22] J. Lafferty, A. McCallum, and F. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proc. 18th Int'l Conf. Machine Learning, 2001.
[23] Handbook of Latent Semantic Analysis, T.K. Landauer, D. McNamara, S. Dennis, and W. Kintsch, eds. Lawrence Erlbaum Associates, 2007.
[24] Y. LeCun and C. Cortes, "The MNIST Database of Handwritten Digits,", 2011.
[25] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[26] D. Liu and J. Nocedal, "On the Limited Memory Method for Large Scale Optimization," Math. Programming: Series A and B, vol. 45, no. 3, pp. 503-528, 1989.
[27] R. Memisevic and G. Hinton, "Unsupervised Learning of Image Transformations," Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2007.
[28] T.P. Minka, "A Comparison of Numerical Optimizers for Logistic Regression," technical report, Microsoft Research, Oct. 2003. Nov. 2004.
[29] S. Mori, K. Yamamoto, and M. Yasuda, "Research on Machine Recognition of Handprinted Characters," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 4, pp. 386-405, July 1984.
[30] F. Och and H. Ney, "Discriminative Training and Maximum Entropy Models for Statistical Machine Translation," Proc. 40th Ann. Meeting Assoc. for Computational Linguistics, pp. 295-302, July 2002.
[31] A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell, "Hidden Conditional Random Fields," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1848-1852, Oct. 2007.
[32] L.R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
[33] L.K. Saul and D.D. Dee, "Multiplicative Updates for Classification by Mixture Models," Proc. Neural Information Processing Systems Conf., vol. 14, pp. 897-904, 2002.
[34] B. Schölkopf, "The USPS Data Set," ftp://ftp.kyb.tuebingen. /, 2010.
[35] B. Schölkopf and A.J. Smola, Learning with Kernels. MIT Press, 2002.
[36] P. Simard, "Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis," Proc. Seventh Int'l Conf. Document Analysis and Recognition, pp. 958-962, Aug. 2003.
[37] S. Uchida and H. Sakoe, "A Survey of Elastic Matching Techniques for Handwritten Character Recognition," IEICE Trans. Information and Systems, vol. E88-D, no. 8, pp. 1781-1790, 2005.
[38] T. Weyand, T. Deselaers, and H. Ney, "Log-Linear Mixtures for Object Class Recognition," Proc. British Machine Vision Conf., 2009.

Index Terms:
support vector machines,handwritten character recognition,image classification,regression analysis,discriminative deformation parameter training,latent log-linear mixture models,handwritten digit classification,image deformation-aware log-linear models,stationary point convergence,USPS data set,MNIST data set,Training,Deformable models,Hidden Markov models,Kernel,Approximation methods,Data models,Numerical models,image classification.,Log-linear models,latent variables,conditional random fields,OCR
T. Gass, T. Deselaers, G. Heigold, H. Ney, "Latent Log-Linear Models for Handwritten Digit Classification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 6, pp. 1105-1117, June 2012, doi:10.1109/TPAMI.2011.218
Usage of this product signifies your acceptance of the Terms of Use.