This Article 
 Bibliographic References 
 Add to: 
Probabilistic Finite-State Machines-Part I
July 2005 (vol. 27 no. 7)
pp. 1013-1025
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition, and machine translation are some of them. In Part I of this paper, we survey these generative objects and study their definitions and properties. In Part II, we will study the relation of probabilistic finite-state automata with other well-known devices that generate strings as hidden Markov models and n\hbox{-}{\rm{grams}} and provide theorems, algorithms, and properties that represent a current state of the art of these objects.

[1] A. Paz, Introduction to Probabilistic Automata. New York: Academic Press, 1971.
[2] L. Rabiner, “A Tutorial n Hidden Markov Models and Selected Applications in Speech Recoginition,” Proc. IEEE, vol. 77, pp. 257-286, 1989.
[3] F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, Mass.: MIT Press, 1998.
[4] R. Carrasco and J. Oncina, “Learning Stochastic Regular Grammars by Means of a State Merging Method,” Proc. Second Int'l Colloquium Grammatical Inference and Applications, pp. 139-152, 1994.
[5] L. Saul and F. Pereira, “Aggregate and Mixed-Order Markov Models for Statistical Language Processing,” Proc. Second Conf. Empirical Methods in Natural Language Processing, pp. 81-89, 1997.
[6] H. Ney, S. Martin, and F. Wessel, “Statistical Language Modeling Using Leaving-One-Out,” Corpus-Based Statistical Methods in Speech and Language Processing, S. Young and G. Bloothooft, eds., pp. 174-207, Kluwer Academic, 1997.
[7] D. Ron, Y. Singer, and N. Tishby, “Learning Probabilistic Automata with Variable Memory Length,” Proc. Seventh Ann. ACM Conf. Computational Learning Theory, pp. 35-46, 1994.
[8] M. Mohri, “Finite-State Transducers in Language and Speech Processing,” Computational Linguistics, vol. 23, no. 3, pp. 269-311, 1997.
[9] K.S. Fu, Syntactic Pattern Recognition and Applications. Prentice Hall, 1982.
[10] L. Miclet, Structural Methods in Pattern Recognition. Springer-Verlag, 1987.
[11] S. Lucas, E. Vidal, A. Amari, S. Hanlon, and J.C. Amengual, “A Comparison of Syntactic and Statistical Techniques for Off-Line OCR,” Proc. Second Int'l Colloquium on Grammatical Inference, pp. 168-179, 1994.
[12] D. Ron, Y. Singer, and N. Tishby, “On the Learnability and Usage of Acyclic Probabilistic Finite Automata,” Proc. Eighth Ann. Conf. Computational Learning Theory, pp. 31-40, 1995.
[13] H. Ney, “Stochastic Grammars and Pattern Recognition,” Proc. NATO Advanced Study Inst. “Speech Recognition and Understanding. Recent Advances, Trends, and Applications,” pp. 313-344, 1992.
[14] N. Abe and H. Mamitsuka, “Predicting Protein Secondary Structure Using Stochastic Tree Grammars,” Machine Learning, vol. 29, pp. 275-301 1997
[15] Y. Sakakibara, M. Brown, R. Hughley, I. Mian, K. Sjolander, R. Underwood, and D. Haussler, “Stochastic Context-Free Grammars for tRNA Modeling,” Nuclear Acids Research, vol. 22, pp. 5112-5120, 1994.
[16] R.B. Lyngsø, C.N.S. Pedersen, and H. Nielsen, “Metrics and Similarity Measures for Hidden Markov Models,” Proc. Intelligent Systems for Molecular Biology, 1999.
[17] R.B. Lyngsø and C.N.S. Pedersen, “Complexity of Comparing Hidden Markov Models,” Proc. 12th Ann. Int'l Symp. Algorithms and Computation, 2001.
[18] P. Cruz and E. Vidal, “Learning Regular Grammars to Model Musical Style: Comparing Different Coding Schemes,” Proc. Int'l Colloquium on Grammatical Inference, pp. 211-222, 1998.
[19] M.G. Thomason, “Regular Stochastic Syntax-Directed Translations,” Technical Report CS-76-17, Computer Science Dept., Univ. of Tennessee, K noxville, 1976.
[20] M. Mohri, F. Pereira, and M. Riley, “The Design Principles of a Weighted Finite-State Transducer Library,” Theoretical Computer Science, vol. 231, pp. 17-32, 2000.
[21] H. Alshawi, S. Bangalore, and S. Douglas, “Learning Dependency Translation Models as Collections of Finite State Head Transducers,” Computational Linguistics, vol. 26, 2000.
[22] H. Alshawi, S. Bangalore, and S. Douglas, “Head Transducer Model for Speech Translation and their Automatic Acquisition from Bilingual Data,” Machine Translation J., vol. 15, nos. 1-2, pp. 105-124, 2000.
[23] J.C. Amengual, J.M. Benedí, F. Casacuberta, A.C. No, A. Castellanos, V.M. Jimenez, D. Llorens, A. Marzal, M. Pastor, F. Prat, E. Vidal, and J.M. Vilar, “The EUTRANS-I Speech Translation System,” Machine Translation J., vol. 15, no. 1-2, pp. 75-103, 2000.
[24] S. Bangalore and G. Riccardi, “Stochastic Finite-State Models for Spoken Language Machine Translation,” Proc. Workshop Embedded Machine Translation Systems, NAACL, pp. 52-59, May 2000.
[25] S. Bangalore and G. Riccardi, “A Finite-State Approach to Machine Translation,” Proc. North Am. Assoc. Computational Linguistics, May 2001.
[26] F. Casacuberta, H. Ney, F.J. Och, E. Vidal, J.M. Vilar, S. Barrachina, I. Garcia-Varea, D. Llorens, C. Martinez, S. Molau, F. Nevado, M. Pastor, D. Picó, A. Sanchis, and C. Tillmann, “Some Approaches to Statistical and Finite-State Speech-to-Speech Translation,” Computer Speech and Language, 2003.
[27] L. Bréhélin, O. Gascuel, and G. Caraux, “Hidden Markov Models with Patterns to Learn Boolean Vector Sequences and Application to the Built-In Self-Test for Integrated Circuits,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 997-1008, Sept. 2001.
[28] Y. Bengio, V.-P. Lauzon, and R. Ducharme, “Experiments on the Application of IOHMMs to Model Financial Returns Series,” IEEE Trans. Neural Networks, vol. 12, no. 1, pp. 113-123, 2001.
[29] K.S. Fu, Syntactic Methods in Pattern Recognition. New-York: Academic Press, 1974.
[30] J.J. Paradaens, “A General Definition of Stochastic Automata,” Computing, vol. 13, pp. 93-105, 1974.
[31] K.S. Fu and T.L. Booth, “Grammatical Inference: Introduction and Survey Parts I and II,” IEEE Trans. Systems, Man, and Cybernetics, vol. 5, pp. 59-72 and pp. 409-423, 1975.
[32] C.S. Wetherell, “Probabilistic Languages: A Review and Some Open Questions,” Computing Surveys, vol. 12, no. 4, 1980.
[33] F. Casacuberta, “Some Relations among Stochastic Finite State Networks Used in Automatic Speech Recogntion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 7, pp. 691-695, July 1990.
[34] D. Angluin, “Identifying Languages from Stochastic Examples,” Technical Report YALEU/DCS/RR-614, Yale Univ., Mar. 1988
[35] M. Kearns and L. Valiant, “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata,” Proc. 21st ACM Symp. Theory of Computing, pp. 433-444, 1989.
[36] M. Kearns, Y. Mansour, D. Ron, R. Rubinfeld, R.E. Schapire, and L. Sellie, “On the Learnability of Discrete Distributions,” Proc. 25th Ann. ACM Symp. Theory of Computing, pp. 273-282, 1994.
[37] M. Kearns and U. Vazirani, An Introduction to Computational Learning Theory. MIT Press, 1994.
[38] N. Abe and M. Warmuth, “On the Computational Complexity of Approximating Distributions by Probabilistic Automata,” Proc. Third Workshop Computational Learning Theory, pp. 52-66, 1998.
[39] P. Dupont, F. Denis, and Y. Esposito, “Links between Probabilistic Automata and Hidden Markov Models: Probability Distributions, Learning Models and Induction Algorithms,” Pattern Recognition, 2004.
[40] E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R.C. Carrasco, “Probabilistic Finite-State Automata— Part II,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 7, pp. 1026-1039, July 2005.
[41] M.O. Rabin, “Probabilistic Automata,” Information and Control, vol. 6, no. 3, pp. 230-245, 1963.
[42] G.D. Forney, “The Viterbi Algorithm,” IEEE Proc., vol. 3, pp. 268-278, 1973.
[43] F. Casacuberta and C. de la Higuera, “Computational Complexity of Problems on Probabilistic Grammars and Transducers,” Proc. Fifth Int'l Colloquium on Grammatical Inference, pp. 15-24, 2000.
[44] R.C. Carrasco, “Accurate Computation of the Relative Entropy between Stochastic Regular Grammars,” RAIRO-Theoretical Informatics and Applications, vol. 31, no. 5, pp. 437-444, 1997.
[45] W.-G. Tzeng, “A Polynomial-Time Algorithm for the Equivalence of Probabilistic Automata,” SIAM J. Computing, vol. 21, no. 2, pp. 216-227, 1992.
[46] A. Fred, “Computation of Substring Probabilities in Stochastic Grammars,” Proc. Fifth Int'l Colloquium Grammatical Inference: Algorithms and Applications, pp. 103-114, 2000.
[47] M. Young-Lai and F.W. Tompa, “Stochastic Grammatical Inference of Text Database Structure,” Machine Learning, vol. 40, no. 2, pp. 111-137, 2000.
[48] D. Ron and R. Rubinfeld, “Learning Fallible Deterministic Finite Automata,” Machine Learning, vol. 18, pp. 149-185, 1995.
[49] C. Cook and A. Rosenfeld, “Some Experiments in Grammatical Inference,” NATO ASI Computer Orientation Learning Process, pp. 157-171, 1974.
[50] K. Knill and S. Young, “Hidden Markov Models in Speech and Language Processing,” Corpus-Based Statistical Methods in Speech and Language Processing. S. Young and G. Bloothoof, eds., Kluwer Academic, pp. 27-68, 1997.
[51] N. Merhav and Y. Ephraim, “Hidden Markov Modeling Using a Dominant State Sequence with Application to Speech Recognition,” Computer Speech and Language, vol. 5, pp. 327-339, 1991.
[52] N. Merhav and Y. Ephraim, “Maximum Likelihood Hidden Markov Modeling Using a Dominant State Sequence of States,” IEEE Trans, Signal Processing, vol. 39, no. 9, pp. 2111-2115, 1991.
[53] R.G. Galleguer, Discrete Stochastic Processes. Kluwer Academic, 1996.
[54] V.C.V.D. Blondel, “Undecidable Problems for Probabilistic Automata of Fixed Dimension,” Theory of Computing Systems, vol. 36, no. 3, pp. 231-245, 2003.
[55] M.H. Harrison, Introduction to Formal Language Theory. Reading, Mass.: Addison-Wesley, 1978.
[56] C. de la Higuera, “Characteristic Sets for Polynomial Grammatical Inference,” Machine Learning, vol. 27, pp. 125-138, 1997.
[57] R. Carrasco and J. Oncina, “Learning Deterministic Regular Grammars from Stochastic Samples in Polynomial Time,” RAIRO-Theoretical Informatics and Applications, vol. 33, no. 1, pp. 1-20, 1999.
[58] C. de la Higuera, “Why $\epsilon\hbox{-}{\rm{Transitions}}$ Are Not Necessary in Probabilistic Finite Automata,” Technical Report 0301, EURISE, Univ. of Saint-Etienne, 2003.
[59] T. Cover and J. Thomas, Elements of Information Theory. Wiley Interscience, 1991.
[60] J. Goodman, “A Bit of Progress in Language Modeling,” technical report, Microsoft Research, 2001.
[61] R. Kneser and H. Ney, “Improved Clustering Techniques for Class-Based Language Modelling,” Proc. European Conf. Speech Comm. And Technology, pp. 973-976, 1993.
[62] P. Brown, V. Della Pietra, P. deSouza, J. Lai, and R. Mercer, “Class-Based N-Gram Models of Natural Language,” Computational Linguistics, vol. 18, no. 4, pp. 467-479, 1992.

Index Terms:
Index Terms- Automata, classes defined by grammars or automata, machine learning, language acquisition, language models, language parsing and understanding, machine translation, speech recognition and synthesis, structural pattern recognition, syntactic pattern recognition.
Enrique Vidal, Franck Thollard, Colin de la Higuera, Francisco Casacuberta, Rafael C. Carrasco, "Probabilistic Finite-State Machines-Part I," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 7, pp. 1013-1025, July 2005, doi:10.1109/TPAMI.2005.147
Usage of this product signifies your acceptance of the Terms of Use.