This Article 
 Bibliographic References 
 Add to: 
Links Between Markov Models and Multilayer Perceptrons
December 1990 (vol. 12 no. 12)
pp. 1167-1178

The statistical use of a particular classic form of a connectionist system, the multilayer perceptron (MLP), is described in the context of the recognition of continuous speech. A discriminant hidden Markov model (HMM) is defined, and it is shown how a particular MLP with contextual and extra feedback input units can be considered as a general form of such a Markov model. A link between these discriminant HMMs, trained along the Viterbi algorithm, and any other approach based on least mean square minimization of an error function (LMSE) is established. It is shown theoretically and experimentally that the outputs of the MLP (when trained along the LMSE or the entropy criterion) approximate the probability distribution over output classes conditioned on the input, i.e. the maximum a posteriori probabilities. Results of a series of speech recognition experiments are reported. The possibility of embedding MLP into HMM is described. Relations with other recurrent networks are also explained.

[1] L. R. Bahl, F. Jelinek, and R. L. Mercer, "A maximum likelihood approach to continuous speech recognition,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-5, no. 2, pp. 179-190, 1983.
[2] L. R. Bahl, P. F. Brown, P. V. de Souza and R. L. Mercer, "Maximum mutual information estimation of hidden Markov model parameters for speech recognition," inProc. IEEE Int., Conf. on Acoust., Speech, Signal Processing, 1986, pp. 49-52.
[3] H. Bourlard, Y. Kamp, H. Ney, and C. J. Wellekens, "Speaker-dependent connected speech recognition via dynamic programming and statistical methods," inSpeech and Speaker Recognition, M. R. Schroeder, Ed. Karger, 1985.
[4] H. Bourlard and C. J. Wellekens, "Discriminant functions for connected speech recognition," inProc. EUSIPCO-86, The Hague, The Netherlands, I.T. Young, J. Biemond, R.P.W. Duin and J.J. Gerbrands, Eds., 1986, pp. 507-510.
[5] H. Bourlard and C. J. Wellekens, "Multilayer perceptrons and automatic speech recognition," inProc. First Int. Conf. Neural Networks, San Diego, CA, 1987, pp. 407-416.
[6] H. Bourlard and C. J. Wellekens, "Speech pattern discrimination and multilayer perceptrons,"Comput., Speech and Language, vol. 3, pp. 1-19, 1989.
[7] J. S. Bridle, M. D. Brown, and R. M. Chamberlain, "A one-pass algorithm for connected word recognition," inProc. 1982 IEEE Int. Conf. Acoust. Speech Signal Processing, (Paris, France), May 1982, pp. 899-902.
[8] J. S. Bridle, "Probabilistic scoring for back-propagation networks, with relationships to statistical pattern recognition,"Proc. Conf. Neural Network for Computing, Snowbird, UT, 1989.
[9] P. F. Brown, "The acoustic-modeling problem in automatic speech recognition," Ph.D thesis, Dept. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, 1987.
[10] D. J. Burr, "Speech recognition experiments with perceptrons," inAIP Conf. Proc., Neural Information Processing Systems, Denver, CO, 1987.
[11] P. A. Devijver and J. Kittler,Pattern Recognition: A Statistical Approach. Englewood Cliffs, NJ: Prentice-Hall International, 1982.
[12] J. L. Elman, "Finding structure in time," Univ. California, San Diego, CRL Tech. Rep. 8801, 1988.
[13] K. Fukunaga,Introduction to Statistical Pattern Recognition. New York: Academic, 1972.
[14] S. Furui, "Speaker independent isolated word recognizer using dynamic features of speech spectrum,"IEEE Trans. Acoust., Speech Signal Processing, vol. ASSP-34, pp. 52-59, 1986.
[15] G. E. Hinton, "Connectionist learning procedures," Carnegie-Mellon Univ., Tech. Rep. CMU-CS-87-115, 1987.
[16] F. Jelinek, "Continuous recognition by statistical methods,"Proc. IEEE, vol. 64, no. 4, pp. 532-555, 1976.
[17] M. L. Jordan, "Serial order: A parallel distributed processing approach," Univ. California, Davis, Tech. Rep. 8604, 1986.
[18] Y. Le Cun, "Modèles connexionistes de l'apprentissage," Thèse de doctorat, Universitéde Paris VI, 1987.
[19] R. P. Lippman, "An introduction to computing with neural nets,"IEEE ASSP Msg., vol. 4, pp. 4-22, 1987.
[20] R. P. Lippmann and B. Gold, "Neural classifiers useful for speech recognition," inProc. 1st Int. Conf. Neural Networks, San Diego, CA, 1987, p. IV-417.
[21] D. J. MacKay, "A method of increasing the contextual input to adaptive pattern recognition systems," Royal Signals and Radar Establishment, Malvern, U.K., Tech. Rep. RIPRREP/1000/14/87, 1987.
[22] S. M. Marcus, "ERIS-context sensitive coding in speech perception,"J. Phonetics, vol. 9, pp. 197-220, 1981.
[23] S. M. Marcus, "Associative models and the time course of speech," inSpeech and Speaker Recognition, M. R. Schroeder, Ed. Karger, 1985.
[24] E. A. Martin, R. P. Lippmann, and D. B. Paul, "Two-stage discriminant analysis for improved isolated-word recognition, inProc. ICASSP, 1987, pp. 709-713.
[25] B. Merialdo, "Phonetic recognition using HMM's and maximum mutual information training, " inProc. ICASSP88, 1988, pp. 111-114.
[26] N. Morgan and H. Bourland, "Generalization and parameter estimation in feedforward nets: Some experiments," inAdvances in Neural Information Processing Systems(D. S. Touretzky, ed.), 1990, pp. 630-637.
[27] H. Murveit and R. W. Brodersen, "An integrated-circuit-based speech recognition system,"IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, no. 6, pp. 1465-1472, 1986.
[28] H. Murveit and M. Weintraub, "1000-word speaker-independent continuous-speech recognition using hidden Markov models," inProc. Int. Conf. ASSP-88, New York, 1988, pp. 115-118.
[29] H. Ney, "The use of a one-stage dynamic programming algorithm for connected word recognition,"IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-32, no. 2, pp. 263-271, Apr. 1984.
[30] A. Noll and H. Ney, "Training of phoneme models in a sentence recognition system" inProc. Int. Conf. ASSP-87, Dallas, TX, 1987, pp. 1277-1280.
[31] F. J. Pineda, "Generalization of back-propagation to recurrent neural networks,"Phys. Rev. Lett., vol. 18, pp. 2229-2232, 1987.
[32] F. J. Pineda, "Dynamics and architecture for neural computation,"J. Complexity, vol. 4, pp. 216-245, 1988.
[33] A. B. Poritz, "Hidden Markov models: A guided tour," inProc. Int. Conf. ASSP-88, New York, 1988, pp. 7-13.
[34] R. W. Prager, T. D. Harrison, and F. Fallside, "Boltzmann machines for speech recognition,"Computer Speech Language, vol. 1, pp. 2-27, 1986.
[35] A. J. Robinson and F. Fallside, "The utility driven dynamic error propagation network," Cambridge, Univ., U.K., Tech. Rep. CUED/F-INFENG/TR.1, 1987.
[36] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representation by error propagation,"Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vols. 1 and 2. Cambridge, MA: MIT Press, 1986.
[37] T. J. Sejnowski and C. R. Rosenberg, "Parallel networks that learn to pronounce English text,"Complex Syst., vol. 1, pp. 145-168, 1987.
[38] S. A. Solla, E. Levin, and M. Fleisher, "Accelerated learning in layered neural networks," AT&T Bell Labs., Manuscript, 1988.
[39] D. W. Tank and J. J. Hopfield, "Concentrating information in time: Analog neural networks with applications to speech recognition problems," inProc. First Int. Conf. Neural Networks, San Diego, CA, 1987, pp. 455-468.
[40] K. P. Unnikrishnan, J. J. Hopfield, and D. W. Tank, "Learning time-delayed connections in a speech recognition circuit," inProc. Conf. Neural Network for Computing, Snowbird, UT, 1988.
[41] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, "Phoneme recognition using time-delay networks," inProc. ICASSP-88, New York, 1988.
[42] R. L. Watrous and L. Shastri, "Learning phonetic features using connectionist networks: an experiment in speech recognition," Univ. Pennsylvania, Tech. Rep. MS-CIS-86-78, 1986.
[43] R. L. Watrous and L. Shastri, "Learning phonetic features using connectionist networks: an experiment in speech recognition," inProc. First Int. Conf. Neural Networks, San Diego, CA, 1987, pp. 381-388.
[44] C. J. Wellekens, "Global connected digit recognition using Baum-Welch algorithm," inProc. ICASSP-86, Tokyo, Japan, 1986, pp. 21.5.1-21.5.4.
[45] C. J. Wellekens, "Explicit time correlation in hidden Markov models for speech recognition," inProc. ICASSP-87, Dallas, TX, 1987, pp. 10.7.1-10.7.3.
[46] H. Bourlard and N. Morgan, "A continuous speech recognition system embedding MLP into HMM," inAdvances in Neural Information Processing Systems 2, D. Touretzky, Ed. Morgan Kaufman, 1990, pp. 186-193.

Index Terms:
speech recognition; Markov models; multilayer perceptrons; connectionist system; discriminant hidden Markov model; Viterbi algorithm; least mean square minimization; error function; probability; Markov processes; minimisation; neural nets; probability; speech recognition
H. Bourlard, C.J. Wellekens, "Links Between Markov Models and Multilayer Perceptrons," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 12, pp. 1167-1178, Dec. 1990, doi:10.1109/34.62605
Usage of this product signifies your acceptance of the Terms of Use.