CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2012 vol.34 Issue No.02 - February

Subscribe

Issue No.02 - February (2012 vol.34)

pp: 292-301

Ralf Schlueter , Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany

Markus Nussbaum-Thom , Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany

Hermann Ney , Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany

ABSTRACT

In many tasks in pattern recognition, such as automatic speech recognition (ASR), optical character recognition (OCR), part-of-speech (POS) tagging, and other string recognition tasks, we are faced with a well-known inconsistency: The Bayes decision rule is usually used to minimize string (symbol sequence) error, whereas, in practice, we want to minimize symbol (word, character, tag, etc.) error. When comparing different recognition systems, we do indeed use symbol error rate as an evaluation measure. The topic of this work is to analyze the relation between string (i.e., 0-1) and symbol error (i.e., metric, integer valued) cost functions in the Bayes decision rule, for which fundamental analytic results are derived. Simple conditions are derived for which the Bayes decision rule with integer-valued metric cost function and with 0-1 cost gives the same decisions or leads to classes with limited cost. The corresponding conditions can be tested with complexity linear in the number of classes. The results obtained do not make any assumption w.r.t. the structure of the underlying distributions or the classification problem. Nevertheless, the general analytic results are analyzed via simulations of string recognition problems with Levenshtein (edit) distance cost function. The results support earlier findings that considerable improvements are to be expected when initial error rates are high.

INDEX TERMS

pattern recognition, Bayes methods, string recognition, Bayes decision rule, pattern recognition, automatic speech recognition, ASR, optical character recognition, OCR, part-of-speech, POS, symbol sequence, error rate symbol, distance cost function, Cost function, Speech recognition, Error analysis, Measurement uncertainty, Statistical analysis, Bayesian methods, cost/loss function., Statistical pattern recognition, classifier design and evaluation, Bayes decision rule

CITATION

Ralf Schlueter, Markus Nussbaum-Thom, Hermann Ney, "Does the Cost Function Matter in Bayes Decision Rule?",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.34, no. 2, pp. 292-301, February 2012, doi:10.1109/TPAMI.2011.163REFERENCES

- [1] N. Ehling, R. Zens, and H. Ney, "Minimum Bayes Risk Decoding for BLEU,"
Proc. 45th Ann. Meeting of the Assoc. for Computational Linguistics, pp. 101-104, June 2007.- [2] V. Goel and W. Byrne, "Minimum Bayes Risk Automatic Speech Recognition,"
Computer Speech and Language, vol. 14, no. 2, pp. 115-135, 2000.- [3] V. Goel and W. Byrne, "Minimum Bayes Risk Methods in Automatic Speech Recognition,"
Pattern Recognition in Speech and Language Processing, W. Chou and B.H. Juang, eds., pp. 51-80, CRC Press, 2003.- [4] V. Goel, S. Kumar, and W. Byrne, "Segmental Minimum Bayes-Risk Decoding for Automatic Speech Recognition,"
IEEE Trans. Speech and Audio Processing, vol. 12, no. 3, pp. 234-249, May 2004.- [5] R.W. Hamming, "Error Detecting and Error Correcting Codes,"
Bell System Technical J., vol. 26, no. 2, pp. 147-160, 1950.- [6] B. Hoffmeister, D. Hillard, S. Hahn, R. Schlüter, M. Ostendorf, and H. Ney, "Cross-Site and Intra-Site ASR System Combination: Comparisons on Lattice and 1-Best Methods,"
Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 1145-1148, Apr. 2007.- [7] F. Jelinek,
Statistical Methods for Speech Recognition. MIT Press, 1997.- [8] J. Lööf, C. Gollan, S. Hahn, G. Heigold, B. Hoffmeister, C. Plahl, D. Rybach, R. Schlüter, and H. Ney, "The RWTH 2007 TC-STAR Evaluation System for European English and Spanish,"
Proc. European Conf. Speech Comm. and Technology, pp. 2145-2148, Aug. 2007.- [9] L. Mangu, E. Brill, and A. Stolcke, "Finding Consensus Among Words: Lattice-Based Word Error Minimization,"
Proc. European Conf. Speech Comm. and Technology, pp. 495-498, Sept. 1999.- [10] B. Merialdo, "Tagging English Text with a Probabilistic Model,"
Computational Linguistics, vol. 2, no. 20, pp. 155-168, June 1994.- [11] H. Ney, M. Popovic, and D. Sündermann, "Error Measures and Bayes Decision Rules Revisited with Applications to POS Tagging,"
Proc. Conf. Empirical Methods in Natural Language Processing, pp. 270-276, July 2004.- [12] R. Schlüter, T. Scharrenbach, V. Steinbiss, and H. Ney, "Bayes Risk Minimization Using Metric Loss Functions,"
Proc. European Conf. Speech Comm. and Technology, http://www-i6. informatik.rwth-aachen.de/ schluterBayesMetricLossEURO SPEECH2005.pdf , pp. 1449-1452, Sept. 2005.- [13] R. Schlüter, M. Nussbaum-Thom, and H. Ney, "Simulation of Fixed Length Word String Probability Distributions," http://www-i6.informatik.rwth-aachen.de/ publications/download/714Schlueter-2010.pdf , Dec. 2010.
- [14] R. Schlüter, M. Nussbaum-Thom, and H. Ney, "On the Relationship between Bayes Risk and Word Error Rate in ASR,"
IEEE Trans. Audio, Speech, and Language Processing, vol. 19, no. 5, pp. 1103-1112, July 2011.- [15] A. Stolcke, Y. König, and M. Weintraub, "Explicit Word Error Rate Minimization in N-Best List Rescoring,"
Proc. European Conf. Speech Comm. and Technology, pp. 163-166, Sept. 1997.- [16] F. Wessel, R. Schlüter, and H. Ney, "Explicit Word Error Minimization Using Word Hypothesis Posterior Probabilities,"
Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 33-36, May 2001. |