This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Probabilistic Finite-State Machines-Part I
July 2005 (vol. 27 no. 7)
pp. 1013-1025
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition, and machine translation are some of them. In Part I of this paper, we survey these generative objects and study their definitions and properties. In Part II, we will study the relation of probabilistic finite-state automata with other well-known devices that generate strings as hidden Markov models and n\hbox{-}{\rm{grams}} and provide theorems, algorithms, and properties that represent a current state of the art of these objects.

[1] A. Paz, Introduction to Probabilistic Automata. New York: Academic Press, 1971.
[2] L. Rabiner, “A Tutorial n Hidden Markov Models and Selected Applications in Speech Recoginition,” Proc. IEEE, vol. 77, pp. 257-286, 1989.
[3] F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, Mass.: MIT Press, 1998.
[4] R. Carrasco and J. Oncina, “Learning Stochastic Regular Grammars by Means of a State Merging Method,” Proc. Second Int'l Colloquium Grammatical Inference and Applications, pp. 139-152, 1994.
[5] L. Saul and F. Pereira, “Aggregate and Mixed-Order Markov Models for Statistical Language Processing,” Proc. Second Conf. Empirical Methods in Natural Language Processing, pp. 81-89, 1997.
[6] H. Ney, S. Martin, and F. Wessel, “Statistical Language Modeling Using Leaving-One-Out,” Corpus-Based Statistical Methods in Speech and Language Processing, S. Young and G. Bloothooft, eds., pp. 174-207, Kluwer Academic, 1997.
[7] D. Ron, Y. Singer, and N. Tishby, “Learning Probabilistic Automata with Variable Memory Length,” Proc. Seventh Ann. ACM Conf. Computational Learning Theory, pp. 35-46, 1994.
[8] M. Mohri, “Finite-State Transducers in Language and Speech Processing,” Computational Linguistics, vol. 23, no. 3, pp. 269-311, 1997.
[9] K.S. Fu, Syntactic Pattern Recognition and Applications. Prentice Hall, 1982.
[10] L. Miclet, Structural Methods in Pattern Recognition. Springer-Verlag, 1987.
[11] S. Lucas, E. Vidal, A. Amari, S. Hanlon, and J.C. Amengual, “A Comparison of Syntactic and Statistical Techniques for Off-Line OCR,” Proc. Second Int'l Colloquium on Grammatical Inference, pp. 168-179, 1994.
[12] D. Ron, Y. Singer, and N. Tishby, “On the Learnability and Usage of Acyclic Probabilistic Finite Automata,” Proc. Eighth Ann. Conf. Computational Learning Theory, pp. 31-40, 1995.
[13] H. Ney, “Stochastic Grammars and Pattern Recognition,” Proc. NATO Advanced Study Inst. “Speech Recognition and Understanding. Recent Advances, Trends, and Applications,” pp. 313-344, 1992.
[14] N. Abe and H. Mamitsuka, “Predicting Protein Secondary Structure Using Stochastic Tree Grammars,” Machine Learning, vol. 29, pp. 275-301 1997
[15] Y. Sakakibara, M. Brown, R. Hughley, I. Mian, K. Sjolander, R. Underwood, and D. Haussler, “Stochastic Context-Free Grammars for tRNA Modeling,” Nuclear Acids Research, vol. 22, pp. 5112-5120, 1994.
[16] R.B. Lyngsø, C.N.S. Pedersen, and H. Nielsen, “Metrics and Similarity Measures for Hidden Markov Models,” Proc. Intelligent Systems for Molecular Biology, 1999.
[17] R.B. Lyngsø and C.N.S. Pedersen, “Complexity of Comparing Hidden Markov Models,” Proc. 12th Ann. Int'l Symp. Algorithms and Computation, 2001.
[18] P. Cruz and E. Vidal, “Learning Regular Grammars to Model Musical Style: Comparing Different Coding Schemes,” Proc. Int'l Colloquium on Grammatical Inference, pp. 211-222, 1998.
[19] M.G. Thomason, “Regular Stochastic Syntax-Directed Translations,” Technical Report CS-76-17, Computer Science Dept., Univ. of Tennessee, K noxville, 1976.
[20] M. Mohri, F. Pereira, and M. Riley, “The Design Principles of a Weighted Finite-State Transducer Library,” Theoretical Computer Science, vol. 231, pp. 17-32, 2000.
[21] H. Alshawi, S. Bangalore, and S. Douglas, “Learning Dependency Translation Models as Collections of Finite State Head Transducers,” Computational Linguistics, vol. 26, 2000.
[22] H. Alshawi, S. Bangalore, and S. Douglas, “Head Transducer Model for Speech Translation and their Automatic Acquisition from Bilingual Data,” Machine Translation J., vol. 15, nos. 1-2, pp. 105-124, 2000.
[23] J.C. Amengual, J.M. Benedí, F. Casacuberta, A.C. No, A. Castellanos, V.M. Jimenez, D. Llorens, A. Marzal, M. Pastor, F. Prat, E. Vidal, and J.M. Vilar, “The EUTRANS-I Speech Translation System,” Machine Translation J., vol. 15, no. 1-2, pp. 75-103, 2000.
[24] S. Bangalore and G. Riccardi, “Stochastic Finite-State Models for Spoken Language Machine Translation,” Proc. Workshop Embedded Machine Translation Systems, NAACL, pp. 52-59, May 2000.
[25] S. Bangalore and G. Riccardi, “A Finite-State Approach to Machine Translation,” Proc. North Am. Assoc. Computational Linguistics, May 2001.
[26] F. Casacuberta, H. Ney, F.J. Och, E. Vidal, J.M. Vilar, S. Barrachina, I. Garcia-Varea, D. Llorens, C. Martinez, S. Molau, F. Nevado, M. Pastor, D. Picó, A. Sanchis, and C. Tillmann, “Some Approaches to Statistical and Finite-State Speech-to-Speech Translation,” Computer Speech and Language, 2003.
[27] L. Bréhélin, O. Gascuel, and G. Caraux, “Hidden Markov Models with Patterns to Learn Boolean Vector Sequences and Application to the Built-In Self-Test for Integrated Circuits,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 997-1008, Sept. 2001.
[28] Y. Bengio, V.-P. Lauzon, and R. Ducharme, “Experiments on the Application of IOHMMs to Model Financial Returns Series,” IEEE Trans. Neural Networks, vol. 12, no. 1, pp. 113-123, 2001.
[29] K.S. Fu, Syntactic Methods in Pattern Recognition. New-York: Academic Press, 1974.
[30] J.J. Paradaens, “A General Definition of Stochastic Automata,” Computing, vol. 13, pp. 93-105, 1974.
[31] K.S. Fu and T.L. Booth, “Grammatical Inference: Introduction and Survey Parts I and II,” IEEE Trans. Systems, Man, and Cybernetics, vol. 5, pp. 59-72 and pp. 409-423, 1975.
[32] C.S. Wetherell, “Probabilistic Languages: A Review and Some Open Questions,” Computing Surveys, vol. 12, no. 4, 1980.
[33] F. Casacuberta, “Some Relations among Stochastic Finite State Networks Used in Automatic Speech Recogntion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 7, pp. 691-695, July 1990.
[34] D. Angluin, “Identifying Languages from Stochastic Examples,” Technical Report YALEU/DCS/RR-614, Yale Univ., Mar. 1988
[35] M. Kearns and L. Valiant, “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata,” Proc. 21st ACM Symp. Theory of Computing, pp. 433-444, 1989.
[36] M. Kearns, Y. Mansour, D. Ron, R. Rubinfeld, R.E. Schapire, and L. Sellie, “On the Learnability of Discrete Distributions,” Proc. 25th Ann. ACM Symp. Theory of Computing, pp. 273-282, 1994.
[37] M. Kearns and U. Vazirani, An Introduction to Computational Learning Theory. MIT Press, 1994.
[38] N. Abe and M. Warmuth, “On the Computational Complexity of Approximating Distributions by Probabilistic Automata,” Proc. Third Workshop Computational Learning Theory, pp. 52-66, 1998.
[39] P. Dupont, F. Denis, and Y. Esposito, “Links between Probabilistic Automata and Hidden Markov Models: Probability Distributions, Learning Models and Induction Algorithms,” Pattern Recognition, 2004.
[40] E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R.C. Carrasco, “Probabilistic Finite-State Automata— Part II,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 7, pp. 1026-1039, July 2005.
[41] M.O. Rabin, “Probabilistic Automata,” Information and Control, vol. 6, no. 3, pp. 230-245, 1963.
[42] G.D. Forney, “The Viterbi Algorithm,” IEEE Proc., vol. 3, pp. 268-278, 1973.
[43] F. Casacuberta and C. de la Higuera, “Computational Complexity of Problems on Probabilistic Grammars and Transducers,” Proc. Fifth Int'l Colloquium on Grammatical Inference, pp. 15-24, 2000.
[44] R.C. Carrasco, “Accurate Computation of the Relative Entropy between Stochastic Regular Grammars,” RAIRO-Theoretical Informatics and Applications, vol. 31, no. 5, pp. 437-444, 1997.
[45] W.-G. Tzeng, “A Polynomial-Time Algorithm for the Equivalence of Probabilistic Automata,” SIAM J. Computing, vol. 21, no. 2, pp. 216-227, 1992.
[46] A. Fred, “Computation of Substring Probabilities in Stochastic Grammars,” Proc. Fifth Int'l Colloquium Grammatical Inference: Algorithms and Applications, pp. 103-114, 2000.
[47] M. Young-Lai and F.W. Tompa, “Stochastic Grammatical Inference of Text Database Structure,” Machine Learning, vol. 40, no. 2, pp. 111-137, 2000.
[48] D. Ron and R. Rubinfeld, “Learning Fallible Deterministic Finite Automata,” Machine Learning, vol. 18, pp. 149-185, 1995.
[49] C. Cook and A. Rosenfeld, “Some Experiments in Grammatical Inference,” NATO ASI Computer Orientation Learning Process, pp. 157-171, 1974.
[50] K. Knill and S. Young, “Hidden Markov Models in Speech and Language Processing,” Corpus-Based Statistical Methods in Speech and Language Processing. S. Young and G. Bloothoof, eds., Kluwer Academic, pp. 27-68, 1997.
[51] N. Merhav and Y. Ephraim, “Hidden Markov Modeling Using a Dominant State Sequence with Application to Speech Recognition,” Computer Speech and Language, vol. 5, pp. 327-339, 1991.
[52] N. Merhav and Y. Ephraim, “Maximum Likelihood Hidden Markov Modeling Using a Dominant State Sequence of States,” IEEE Trans, Signal Processing, vol. 39, no. 9, pp. 2111-2115, 1991.
[53] R.G. Galleguer, Discrete Stochastic Processes. Kluwer Academic, 1996.
[54] V.C.V.D. Blondel, “Undecidable Problems for Probabilistic Automata of Fixed Dimension,” Theory of Computing Systems, vol. 36, no. 3, pp. 231-245, 2003.
[55] M.H. Harrison, Introduction to Formal Language Theory. Reading, Mass.: Addison-Wesley, 1978.
[56] C. de la Higuera, “Characteristic Sets for Polynomial Grammatical Inference,” Machine Learning, vol. 27, pp. 125-138, 1997.
[57] R. Carrasco and J. Oncina, “Learning Deterministic Regular Grammars from Stochastic Samples in Polynomial Time,” RAIRO-Theoretical Informatics and Applications, vol. 33, no. 1, pp. 1-20, 1999.
[58] C. de la Higuera, “Why $\epsilon\hbox{-}{\rm{Transitions}}$ Are Not Necessary in Probabilistic Finite Automata,” Technical Report 0301, EURISE, Univ. of Saint-Etienne, 2003.
[59] T. Cover and J. Thomas, Elements of Information Theory. Wiley Interscience, 1991.
[60] J. Goodman, “A Bit of Progress in Language Modeling,” technical report, Microsoft Research, 2001.
[61] R. Kneser and H. Ney, “Improved Clustering Techniques for Class-Based Language Modelling,” Proc. European Conf. Speech Comm. And Technology, pp. 973-976, 1993.
[62] P. Brown, V. Della Pietra, P. deSouza, J. Lai, and R. Mercer, “Class-Based N-Gram Models of Natural Language,” Computational Linguistics, vol. 18, no. 4, pp. 467-479, 1992.

Index Terms:
Index Terms- Automata, classes defined by grammars or automata, machine learning, language acquisition, language models, language parsing and understanding, machine translation, speech recognition and synthesis, structural pattern recognition, syntactic pattern recognition.
Citation:
Enrique Vidal, Franck Thollard, Colin de la Higuera, Francisco Casacuberta, Rafael C. Carrasco, "Probabilistic Finite-State Machines-Part I," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 7, pp. 1013-1025, July 2005, doi:10.1109/TPAMI.2005.147
Usage of this product signifies your acceptance of the Terms of Use.