
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
H. Bourlard, C.J. Wellekens, "Links Between Markov Models and Multilayer Perceptrons," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 12, pp. 11671178, December, 1990.  
BibTex  x  
@article{ 10.1109/34.62605, author = {H. Bourlard and C.J. Wellekens}, title = {Links Between Markov Models and Multilayer Perceptrons}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {12}, number = {12}, issn = {01628828}, year = {1990}, pages = {11671178}, doi = {http://doi.ieeecomputersociety.org/10.1109/34.62605}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Pattern Analysis and Machine Intelligence TI  Links Between Markov Models and Multilayer Perceptrons IS  12 SN  01628828 SP1167 EP1178 EPD  11671178 A1  H. Bourlard, A1  C.J. Wellekens, PY  1990 KW  speech recognition; Markov models; multilayer perceptrons; connectionist system; discriminant hidden Markov model; Viterbi algorithm; least mean square minimization; error function; probability; Markov processes; minimisation; neural nets; probability; speech recognition VL  12 JA  IEEE Transactions on Pattern Analysis and Machine Intelligence ER   
The statistical use of a particular classic form of a connectionist system, the multilayer perceptron (MLP), is described in the context of the recognition of continuous speech. A discriminant hidden Markov model (HMM) is defined, and it is shown how a particular MLP with contextual and extra feedback input units can be considered as a general form of such a Markov model. A link between these discriminant HMMs, trained along the Viterbi algorithm, and any other approach based on least mean square minimization of an error function (LMSE) is established. It is shown theoretically and experimentally that the outputs of the MLP (when trained along the LMSE or the entropy criterion) approximate the probability distribution over output classes conditioned on the input, i.e. the maximum a posteriori probabilities. Results of a series of speech recognition experiments are reported. The possibility of embedding MLP into HMM is described. Relations with other recurrent networks are also explained.
[1] L. R. Bahl, F. Jelinek, and R. L. Mercer, "A maximum likelihood approach to continuous speech recognition,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI5, no. 2, pp. 179190, 1983.
[2] L. R. Bahl, P. F. Brown, P. V. de Souza and R. L. Mercer, "Maximum mutual information estimation of hidden Markov model parameters for speech recognition," inProc. IEEE Int., Conf. on Acoust., Speech, Signal Processing, 1986, pp. 4952.
[3] H. Bourlard, Y. Kamp, H. Ney, and C. J. Wellekens, "Speakerdependent connected speech recognition via dynamic programming and statistical methods," inSpeech and Speaker Recognition, M. R. Schroeder, Ed. Karger, 1985.
[4] H. Bourlard and C. J. Wellekens, "Discriminant functions for connected speech recognition," inProc. EUSIPCO86, The Hague, The Netherlands, I.T. Young, J. Biemond, R.P.W. Duin and J.J. Gerbrands, Eds., 1986, pp. 507510.
[5] H. Bourlard and C. J. Wellekens, "Multilayer perceptrons and automatic speech recognition," inProc. First Int. Conf. Neural Networks, San Diego, CA, 1987, pp. 407416.
[6] H. Bourlard and C. J. Wellekens, "Speech pattern discrimination and multilayer perceptrons,"Comput., Speech and Language, vol. 3, pp. 119, 1989.
[7] J. S. Bridle, M. D. Brown, and R. M. Chamberlain, "A onepass algorithm for connected word recognition," inProc. 1982 IEEE Int. Conf. Acoust. Speech Signal Processing, (Paris, France), May 1982, pp. 899902.
[8] J. S. Bridle, "Probabilistic scoring for backpropagation networks, with relationships to statistical pattern recognition,"Proc. Conf. Neural Network for Computing, Snowbird, UT, 1989.
[9] P. F. Brown, "The acousticmodeling problem in automatic speech recognition," Ph.D thesis, Dept. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, 1987.
[10] D. J. Burr, "Speech recognition experiments with perceptrons," inAIP Conf. Proc., Neural Information Processing Systems, Denver, CO, 1987.
[11] P. A. Devijver and J. Kittler,Pattern Recognition: A Statistical Approach. Englewood Cliffs, NJ: PrenticeHall International, 1982.
[12] J. L. Elman, "Finding structure in time," Univ. California, San Diego, CRL Tech. Rep. 8801, 1988.
[13] K. Fukunaga,Introduction to Statistical Pattern Recognition. New York: Academic, 1972.
[14] S. Furui, "Speaker independent isolated word recognizer using dynamic features of speech spectrum,"IEEE Trans. Acoust., Speech Signal Processing, vol. ASSP34, pp. 5259, 1986.
[15] G. E. Hinton, "Connectionist learning procedures," CarnegieMellon Univ., Tech. Rep. CMUCS87115, 1987.
[16] F. Jelinek, "Continuous recognition by statistical methods,"Proc. IEEE, vol. 64, no. 4, pp. 532555, 1976.
[17] M. L. Jordan, "Serial order: A parallel distributed processing approach," Univ. California, Davis, Tech. Rep. 8604, 1986.
[18] Y. Le Cun, "Modèles connexionistes de l'apprentissage," Thèse de doctorat, Universitéde Paris VI, 1987.
[19] R. P. Lippman, "An introduction to computing with neural nets,"IEEE ASSP Msg., vol. 4, pp. 422, 1987.
[20] R. P. Lippmann and B. Gold, "Neural classifiers useful for speech recognition," inProc. 1st Int. Conf. Neural Networks, San Diego, CA, 1987, p. IV417.
[21] D. J. MacKay, "A method of increasing the contextual input to adaptive pattern recognition systems," Royal Signals and Radar Establishment, Malvern, U.K., Tech. Rep. RIPRREP/1000/14/87, 1987.
[22] S. M. Marcus, "ERIScontext sensitive coding in speech perception,"J. Phonetics, vol. 9, pp. 197220, 1981.
[23] S. M. Marcus, "Associative models and the time course of speech," inSpeech and Speaker Recognition, M. R. Schroeder, Ed. Karger, 1985.
[24] E. A. Martin, R. P. Lippmann, and D. B. Paul, "Twostage discriminant analysis for improved isolatedword recognition, inProc. ICASSP, 1987, pp. 709713.
[25] B. Merialdo, "Phonetic recognition using HMM's and maximum mutual information training, " inProc. ICASSP88, 1988, pp. 111114.
[26] N. Morgan and H. Bourland, "Generalization and parameter estimation in feedforward nets: Some experiments," inAdvances in Neural Information Processing Systems(D. S. Touretzky, ed.), 1990, pp. 630637.
[27] H. Murveit and R. W. Brodersen, "An integratedcircuitbased speech recognition system,"IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP34, no. 6, pp. 14651472, 1986.
[28] H. Murveit and M. Weintraub, "1000word speakerindependent continuousspeech recognition using hidden Markov models," inProc. Int. Conf. ASSP88, New York, 1988, pp. 115118.
[29] H. Ney, "The use of a onestage dynamic programming algorithm for connected word recognition,"IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP32, no. 2, pp. 263271, Apr. 1984.
[30] A. Noll and H. Ney, "Training of phoneme models in a sentence recognition system" inProc. Int. Conf. ASSP87, Dallas, TX, 1987, pp. 12771280.
[31] F. J. Pineda, "Generalization of backpropagation to recurrent neural networks,"Phys. Rev. Lett., vol. 18, pp. 22292232, 1987.
[32] F. J. Pineda, "Dynamics and architecture for neural computation,"J. Complexity, vol. 4, pp. 216245, 1988.
[33] A. B. Poritz, "Hidden Markov models: A guided tour," inProc. Int. Conf. ASSP88, New York, 1988, pp. 713.
[34] R. W. Prager, T. D. Harrison, and F. Fallside, "Boltzmann machines for speech recognition,"Computer Speech Language, vol. 1, pp. 227, 1986.
[35] A. J. Robinson and F. Fallside, "The utility driven dynamic error propagation network," Cambridge, Univ., U.K., Tech. Rep. CUED/FINFENG/TR.1, 1987.
[36] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representation by error propagation,"Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vols. 1 and 2. Cambridge, MA: MIT Press, 1986.
[37] T. J. Sejnowski and C. R. Rosenberg, "Parallel networks that learn to pronounce English text,"Complex Syst., vol. 1, pp. 145168, 1987.
[38] S. A. Solla, E. Levin, and M. Fleisher, "Accelerated learning in layered neural networks," AT&T Bell Labs., Manuscript, 1988.
[39] D. W. Tank and J. J. Hopfield, "Concentrating information in time: Analog neural networks with applications to speech recognition problems," inProc. First Int. Conf. Neural Networks, San Diego, CA, 1987, pp. 455468.
[40] K. P. Unnikrishnan, J. J. Hopfield, and D. W. Tank, "Learning timedelayed connections in a speech recognition circuit," inProc. Conf. Neural Network for Computing, Snowbird, UT, 1988.
[41] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, "Phoneme recognition using timedelay networks," inProc. ICASSP88, New York, 1988.
[42] R. L. Watrous and L. Shastri, "Learning phonetic features using connectionist networks: an experiment in speech recognition," Univ. Pennsylvania, Tech. Rep. MSCIS8678, 1986.
[43] R. L. Watrous and L. Shastri, "Learning phonetic features using connectionist networks: an experiment in speech recognition," inProc. First Int. Conf. Neural Networks, San Diego, CA, 1987, pp. 381388.
[44] C. J. Wellekens, "Global connected digit recognition using BaumWelch algorithm," inProc. ICASSP86, Tokyo, Japan, 1986, pp. 21.5.121.5.4.
[45] C. J. Wellekens, "Explicit time correlation in hidden Markov models for speech recognition," inProc. ICASSP87, Dallas, TX, 1987, pp. 10.7.110.7.3.
[46] H. Bourlard and N. Morgan, "A continuous speech recognition system embedding MLP into HMM," inAdvances in Neural Information Processing Systems 2, D. Touretzky, Ed. Morgan Kaufman, 1990, pp. 186193.