This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Probabilistic Finite-State Machines-Part II
July 2005 (vol. 27 no. 7)
pp. 1026-1039
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition or in fields to which pattern recognition is linked. In Part I of this paper, we surveyed these objects and studied their properties. In this Part II, we study the relations between probabilistic finite-state automata and other well-known devices that generate strings like hidden Markov models and n{\hbox{-}}{\rm grams} and provide theorems, algorithms, and properties that represent a current state of the art of these objects.

[1] E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R.C. Carrasco, “Probabilistic Finite-State Automata— Part I,” IEEE Trans. Pattern Analysis and Machine Intelligence, special issue on syntactic and structural pattern recognition, vol 27, no. 7, pp. 1013-1025, July 2005.
[2] L.E. Baum, “An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes,” Inequalities, vol. 3, pp. 1-8, 1972.
[3] C.F.J. Wu, “On the Convergence Properties of the EM Algorithm,” Annals of Statistics, vol. 11, no. 1, pp. 95-103, 1983.
[4] F. Casacuberta, “Statistical Estimation of Stochastic Context-Free Grammars,” Pattern Recognition Letters, vol. 16, pp. 565-573, 1995.
[5] F. Casacuberta, “Growth Transformations for Probabilistic Functions of Stochastic Grammars,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 10, no. 3, pp. 183-201, 1996.
[6] G.J. McLachlan and T. Krishnan, The EM Algorithm and Extensions. Wiley, 1997.
[7] D. Picó and F. Casacuberta, “Some Statistical-Estimation Methods for Stochastic Finite-State Transducers,” Machine Learning J., vol. 44, no. 1, pp. 121-141, 2001.
[8] P. Dupont, F. Denis, and Y. Esposito, “Links between Probabilistic Automata and Hidden Markov Models: Probability Distributions, Learning Models and Induction Algorithms,” Pattern Recognition, 2004.
[9] I.H. Witten and T.C. Bell, “The Zero Frequency Problem: Estimating the Probabilities of Novel Events in Adaptive Test Compression,” IEEE Trans. Information Theory, vol. 37, no. 4, pp. 1085-1094, 1991.
[10] H. Ney, S. Martin, and F. Wessel, Corpus-Based Statiscal Methods in Speech and Language Processing, S. Young and G. Bloothooft, eds., pp. 174-207, Kluwer Academic Publishers, 1997.
[11] P. Dupont and J.-C. Amengual, “Smoothing Probabilistic Automata: An Error-Correcting Approach,” Proc. Fifth Int'l Colloquium Grammatical Inference: Algorithms and Applications, pp. 51-56, 2000.
[12] Y. Sakakibara, M. Brown, R. Hughley, I. Mian, K. Sjolander, R. Underwood, and D. Haussler, “Stochastic Context-Free Grammars for tRNA Modeling,” Nuclear Acids Research, vol. 22, pp. 5112-5120, 1994.
[13] T. Kammeyer and R.K. Belew, “Stochastic Context-Free Grammar Induction with a Genetic Algorithm Using Local Search,” Foundations of Genetic Algorithms IV, R.K. Belew and M. Vose, eds., 1996.
[14] N. Abe and H. Mamitsuka, “Predicting Protein Secondary Structure Using Stochastic Tree Grammars,” Machine Learning J., vol. 29, pp. 275-301, 1997.
[15] R.C. Carrasco, J. Oncina, and J. Calera-Rubio, “Stochastic Inference of Regular Tree Languages,” Machine Learning J., vol. 44, no. 1, pp. 185-197, 2001.
[16] M. Kearns and L. Valiant, “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata,” Proc. 21st ACM Symp. Theory of Computing, pp. 433-444, 1989.
[17] N. Abe and M. Warmuth, “On the Computational Complexity of Approximating Distributions by Probabilistic Automata,” Machine Learning J., vol. 9, pp. 205-260, 1992.
[18] M. Kearns, Y. Mansour, D. Ron, R. Rubinfeld, R.E. Schapire, and L. Sellie, “On the Learnability of Discrete Distributions,” Proc. 25th Ann. ACM Symp. Theory of Computing, pp. 273-282, 1994.
[19] D. Ron, Y. Singer, and N. Tishby, “On the Learnability and Usage of Acyclic Probabilistic Finite Automata,” Proc. Conf. Learning Theory, pp. 31-40, 1995.
[20] A. Stolcke and S. Omohundro, “Inducing Probabilistic Grammars by Bayesian Model Merging,” Proc. Second Int'l Colloquium Grammatical Inference and Applications, pp. 106-118, 1994.
[21] F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, Mass.: MIT Press, 1998.
[22] P. García and E. Vidal, “Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 9, pp. 920-925, Sept. 1990.
[23] Y. Zalcstein, “Locally Testable Languages,” J. Computer and System Sciences, vol. 6, pp. 151-167, 1972.
[24] R. McNaughton, “Algebraic Decision Procedures for Local Testability,” Math. System Theory, vol. 8, no. 1, pp. 60-67, 1974.
[25] E. Vidal and D. Llorens, “Using Knowledge to Improve N-Gram Language Modelling through the MGGI Methodology,” Proc. Third Int'l Colloquium Grammatical Inference: Learning Syntax from Sentences, pp. 179-190, 1996.
[26] S. Eilenberg, Automata, Languages and Machines. Vol. A. New York: Academic, 1974.
[27] L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recoginition,” Proc. IEEE, vol. 77, pp. 257-286, 1989.
[28] J. Picone, “Continuous Speech Recognition Using Hidden Markov Models,” IEEE ASSP Magazine, vol. 7, no. 3, pp. 26-41, 1990.
[29] I. Bazzi, R. Schwartz, and J. Makhoul, “An Omnifont Open-Vocabulary OCR System for English and Arabic,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 6, pp. 495-504, June 1999.
[30] A. Toselli, A. Juan, D. Keysers, J. González, I. Salvador, H. Ney, E. Vidal, and F. Casacuberta, “Integrated Handwriting Recognition and Interpretation Using Finite State Models,” Int'l J. Pattern Recognition and Artificial Intelligence, 2004.
[31] F. Casacuberta, “Finite-State Transducers for Speech-Input Translation,” Proc. Workshop Automatic Speech Recognition and Understanding, Dec. 2001.
[32] F. Casacuberta, E. Vidal, and J.M. Vilar, “Architectures for Speech-to-Speech Translation Using Finite-State Models,” Proc. Workshop on Speech-to-Speech Translation: Algorithms and Systems, pp. 39-44, July 2002.
[33] A. Molina and F. Pla, “Shallow Parsing Using Specialized HMMs,” J. Machine Learning Research, vol. 2, pp. 559-594, Mar. 2002.
[34] H. Bunke and T. Caelli, Hidden Markov Models Applications in Computer Vision, Series in Machine Perception and Artificial Intelligence, vol. 45. World Scientific, 2001.
[35] R. Llobet, A.H. Toselli, J.C. Perez-Cortes, and A. Juan, “Computer-Aided Prostate Cancer Detection in Ultrasonographic Images,” Proc. First Iberian Conf. Pattern Recognition and Image Analysis, pp. 411-419, 2003.
[36] Y. Bengio, V.-P. Lauzon, and R. Ducharme, “Experiments on the Application of IOHMMs to Model Financial Returns Series,” IEEE Trans. Neural Networks, vol. 12, no. 1, pp. 113-123, 2001.
[37] F. Casacuberta, “Some Relations among Stochastic Finite State Networks Used in Automatic Speech Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 7, pp. 691-695, July 1990.
[38] J. Goodman, “A Bit of Progress in Language Modeling,” technical report, Microsoft Research, 2001.
[39] D. McAllester and R.E. Schapire, “On the Convergence Rate of Good-Turing Estimators,” Proc. 13th Ann. Conf. Computer Learning Theory, pp. 1-6, 2000.
[40] M. Mohri, F. Pereira, and M. Riley, “The Design Principles of a Weighted Finite-State Transducer Library,” Theoretical Computer Science, vol. 231, pp. 17-32, 2000.
[41] R. Chaudhuri and S. Rao, “Approximating Grammar Probabilities: Solution to a Conjecture,” J. Assoc. Computing Machinery, vol. 33, no. 4, pp. 702-705, 1986.
[42] C.S. Wetherell, “Probabilistic Languages: A Review and Some Open Questions,” Computing Surveys, vol. 12, no. 4, 1980.
[43] F. Casacuberta, “Probabilistic Estimation of Stochastic Regular Syntax-Directed Translation Schemes,” Proc. Spanish Symp. Pattern Recognition and Image Analysis, R. Moreno, ed., pp. 201-297, 1995.
[44] F. Casacuberta, “Maximum Mutual Information and Conditional Maximum Likelihood Estimation of Stochastic Regular Syntax-Directed Translation Schemes,” Proc. Third Int'l Colloquium Grammatical Inference: Learning Syntax from Sentences, pp. 282-291, 1996.
[45] D. Picó and F. Casacuberta, “A Statistical-Estimation Method for Stochastic Finite-State Transducers Based on Entropy Measures,” Proc. Joint Int'l Assoc. Pattern Recognition Workshops Syntactical and Structural Pattern Recognition and Statistical Pattern Recognition, pp. 417-426, 2000.
[46] E.M. Gold, “Language Identification in the Limit,” Information and Control, vol. 10, no. 5, pp. 447-474, 1967.
[47] E.M. Gold, “Complexity of Automaton Identification from Given Data,” Information and Control, vol. 37, pp. 302-320, 1978.
[48] L.G. Valiant, “A Theory of the Learnable,” Comm. Assoc. Computing Machinery, vol. 27, no. 11, pp. 1134-1142, 1984.
[49] L. Pitt and M. Warmuth, “The Minimum Consistent DFA Problem Cannot be Approximated within Any Polynomial,” J. Assoc. Computing Machinery, vol. 40, no. 1, pp. 95-142, 1993.
[50] F. Denis, C. d'Halluin, and R. Gilleron, “PAC Learning with Simple Examples,” Proc. 13th Symp. Theoretical Aspects of Computer Science, pp. 231-242, 1996.
[51] F. Denis and R. Gilleron, “PAC Learning under Helpful Distributions,” Algorithmic Learning Theory, 1997.
[52] R. Parekh and V. Honavar, “Learning DFA from Simple Examples,” Proc. Workshop Automata Induction, Grammatical Inference, and Language Acquisition, 1997.
[53] J.J. Horning, “A Procedure for Grammatical Inference,” Information Processing, vol. 71, pp. 519-523, 1972.
[54] D. Angluin, “Identifying Languages from Stochastic Examples,” Technical Report YALEU/DCS/RR-614, Yale Univ., Mar. 1988.
[55] S. Kapur and G. Bilardi, “Language Learning from Stochastic Input,” Proc. Fifth Conf. Computational Learning Theory, pp. 303-310, July 1992.
[56] N. Abe and M. Warmuth, “On the Computational Complexity of Approximating Distributions by Probabilistic Automata,” Proc. Third Workshop Computational Learning Theory, pp. 52-66, 1998.
[57] R. Carrasco and J. Oncina, “Learning Deterministic Regular Grammars from Stochastic Samples in Polynomial Time,” Theoretical Informatics and Applications, vol. 33, no. 1, pp. 1-20, 1999.
[58] A. Clark and F. Thollard, “Pac-Learnability of Probabilistic Deterministic Finite State Automata,” J. Machine Learning Research, vol. 5, pp. 473-497, May 2004.
[59] C. de la Higuera and F. Thollard, “Identification in the Limit with Probability One of Stochastic Deterministic Finite Automata,” Proc. Fifth Int'l Colloquium Grammatical Inference: Algorithms and Applications, pp. 15-24. 2000.
[60] R. Carrasco and J. Oncina, “Learning Stochastic Regular Grammars by Means of a State Merging Method,” Proc. Second Int'l Colloquium Grammatical Inference, pp. 139-150, 1994.
[61] F. Thollard, P. Dupont, and C. de la Higuera, “Probabilistic DFA Inference Using Kullback-Leibler Divergence and Minimality,” Proc. 17th Int'l Conf. Machine Learning, pp. 975-982, 2000.
[62] F. Thollard and A. Clark, “Shallow Parsing Using Probabilistic Grammatical Inference,” Proc. Sixth Int'l Colloquium Grammatical Inference, pp. 269-282, Sept. 2002.
[63] C. Kermorvant and P. Dupont, “Stochastic Grammatical Inference with Multinomial Tests,” Proc. Sixth Int'l Colloquium Grammatical Inference: Algorithms and Applications, pp. 149-160, 2002.
[64] M. Young-Lai and F.W. Tompa, “Stochastic Grammatical Inference of Text Database Structure,” Machine Learning J., vol. 40, no. 2, pp. 111-137, 2000.
[65] P. García, E. Vidal, and F. Casacuberta, “Local Languages, the Succesor Method, and a Step Towards a General Methodology for the Inference of Regular Grammars,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 6, pp. 841-845, June 1987.
[66] A. Orlitsky, N.P. Santhanam, and J. Zhang, “Always Good Turing: Asymptotically Optimal Probability Estimation,” Proc. 44th Ann. IEEE Symp. Foundations of Computer Science, p. 179, Oct. 2003.
[67] S. Katz, “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,” IEEE Trans. Acoustic, Speech and Signal Processing, vol. 35, no. 3, pp. 400-401, 1987.
[68] R. Kneser and H. Ney, “Improved Backing-Off for m-Gram Language Modeling,” IEEE Int'l Conf. Acoustics, Speech and Signal Processing, vol. 1, pp. 181-184, 1995.
[69] S.F. Chen and J. Goodman, “An Empirical Study of Smoothing Techniques for Language Modeling,” Proc. 34th Ann. Meeting of the Assoc. for Computational Linguistics, pp. 310-318, 1996.
[70] F. Thollard, “Improving Probabilistic Grammatical Inference Core Algorithms with Post-Processing Techniques,” Proc. 18th Int'l Conf. Machine Learning, pp. 561-568, 2001.
[71] D. Llorens, J.M. Vilar, and F. Casacuberta, “Finite State Language Models Smoothed Using n-Grams,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 16, no. 3, pp. 275-289, 2002.
[72] J. Amengual, A. Sanchis, E. Vidal, and J. Benedí, “Language Simplification through Error-Correcting and Grammatical Inference Techniques,” Machine Learning J., vol. 44, no. 1, pp. 143-159, 2001.
[73] P. Dupont and L. Chase, “Using Symbol Clustering to Improve Probabilistic Automaton Inference,” Proc. Fourth Int'l Colloquium Grammatical Inference, pp. 232-243, 1998.
[74] R. Kneser and H. Ney, “Improved Clustering Techniques for Class-Based Language Modelling,” Proc. European Conf. Speech Comm. and Technology, pp. 973-976, 1993.
[75] C. Kermorvant and C. de la Higuera, “Learning Languages with Help,” Proc. Int'l Colloquium Grammatical Inference, vol. 2484, 2002.
[76] L. Breiman, “Bagging Predictors,” Machine Learning J., vol. 24, no. 2, pp. 123-140, 1996.
[77] S. Bangalore and G. Riccardi, “Stochastic Finite-State Models for Spoken Language Machine Translation,” Proc. Workshop Embedded Machine Translation Systems, North Am. Chapter Assoc. for Computational Linguistics, pp. 52-59, May 2000.
[78] S. Bangalore and G. Ricardi, “A Finite-State Approach to Machine Translation,” Proc. North Am. Chapter Assoc. for Computational Linguistics, May 2001.
[79] J. Oncina, P. García, and E. Vidal, “Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 5, pp. 448-458, May 1993.
[80] J.M. Vilar, “Improve the Learning of Subsequential Transducers by Using Alignments and Dictionaries,” Proc. Fifth Int'l Colloquium Grammatical Inference: Algorithms and Applications, pp. 298-312, 2000.
[81] F. Casacuberta, “Inference of Finite-State Transducers by Using Regular Grammars and Morphisms,” Proc. Fifth Int'l Colloquium Grammatical Inference: Algorithms and Applications, pp. 1-14, 2000.
[82] F. Casacuberta, H. Ney, F.J. Och, E. Vidal, J.M. Vilar, S. Barrachina, I. García-Varea, D. Llorens, C. Martínez, S. Molau, F. Nevado, M. Pastor, D. Picó, A. Sanchis, and C. Tillmann, “Some Approaches to Statistical and Finite-State Speech-to-Speech Translation,” Computer Speech and Language, 2003.
[83] F. Casacuberta and E. Vidal, “Machine Translation with Inferred Stochastic Finite-State Transducers,” Computational Linguistics, vol. 30, no. 2, pp. 205-225, 2004.
[84] M. Mohri, “Finite-State Transducers in Language and Speech Processing,” Computational Linguistics, vol. 23, no. 3, pp. 269-311, 1997.
[85] M. Mohri, F. Pereira, and M. Riley, “Weighted Finite-State Transducers in Speech Recognition,” Computer Speech and Language, vol. 16, no. 1, pp. 69-88, 2002.
[86] H. Alshawi, S. Bangalore, and S. Douglas, “Head Transducer Model for Speech Translation and Their Automatic Acquisition from Bilingual Data,” Machine Translation, 2000.
[87] H. Alshawi, S. Bangalore, and S. Douglas, “Learning Dependency Translation Models as Collections of Finite State Head Transducers,” Computational Linguistics, vol. 26, 2000.
[88] F. Casacuberta and C. de la Higuera, “Computational Complexity of Problems on Probabilistic Grammars and Transducers,” Proc. Fifth Int'l Colloquium Grammatical Inference: Algorithms and Applications, pp. 15-24, 2000.
[89] F. Casacuberta, E. Vidal, and D. Picó, “Inference of Finite-State Transducers from Regular Languages,” Pattern Recognition, 2004, to appear.
[90] E. Mäkinen, “Inferring Finite Transducers,” Technical Report A-1999-3, Univ. of Tampere, 1999.
[91] E. Vidal, P. García, and E. Segarra, “Inductive Learning of Finite-State Transducers for the Interpretation of Unidimensional Objects,” Structural Pattern Analysis, R. Mohr, T. Pavlidis, and A. Sanfeliu, eds., pp. 17-35, 1989.
[92] K. Knight and Y. Al-Onaizan, “Translation with Finite-State Devices,” Proc. Proc. Third Conf. Assoc. for Machine Translation in the Americas: Machine Translation and the Information Soup, vol. 1529, pp. 421-437, 1998.
[93] J. Eisner, “Parameter Estimation for Probabilistic Finite-State Transducers,” Proc. 40th Ann. Meeting Assoc. Computational Linguistics, July 2002.
[94] D. Llorens, “Suavizado de Autómatas y Traductores Finitos Estocásticos,” PhD dissertation, Univ. Politècnica de València, 2000.
[95] M.-J. Nederhoff, “Practical Experiments with Regular Approximation of Context-Free Languages,” Computational Linguistics, vol. 26, no. 1, 2000.
[96] M. Mohri and M.-J. Nederhof, “Regular Approximations of Context-Free Grammars through Transformations,” Robustness in Language and Speech Technology, J.-C. Junqua and G. van Noord, eds., pp. 252-261. Kluwer Academic Publisher, Springer Verlag, 2000.
[97] K. Lari and S. Young, “The Estimation of Stochastic Context-Free Grammars Using the Inside-Outside Algorithm,” Computer Speech and Language, no. 4, pp. 35-56, 1990.
[98] J. Sánchez and J. Benedí, “Consistency of Stocastic Context— Free Grammars from Probabilistic Estimation Based on Growth Transformation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 9, pp. 1052-1055, Sept. 1997.
[99] J. Sánchez, J. Benedí, and F. Casacuberta, “Comparison between the Inside-Outside Algorithm and the Viterbi Algorithm for Stochastic Context-Free Grammars,” Proc. Sixth Int'l Workshop Advances in Syntactical and Structural Pattern Recognition, pp. 50-59, 1996.
[100] Y. Takada, “Grammatical Inference for Even Linear Languages Based on Control Sets,” Information Processing Letters, vol. 28, no. 4, pp. 193-199, 1988.
[101] T. Koshiba, E. Mäkinen, and Y. Takada, “Learning Deterministic Even Linear Languages from Positive Examples,” Theoretical Computer Science, vol. 185, no. 1, pp. 63-79, 1997.
[102] T. Koshiba, E. Mäkinen, and Y. Takada, “Inferring Pure Context-Free Languages from Positive Data,” Acta Cybernetica, vol. 14, no. 3, pp. 469-477, 2000.
[103] Y. Sakakibara, “Learning Context-Free Grammars from Structural Data in Polynomial Time,” Theoretical Computer Science, vol. 76, pp. 223-242, 1990.
[104] F. Maryanski and M.G. Thomason, “Properties of Stochastic Syntax-Directed Translation Schemata,” Int'l J. Computer and Information Science, vol. 8, no. 2, pp. 89-110, 1979.
[105] A. Fred, “Computation of Substring Probabilities in Stochastic Grammars,” Proc. Fifth Int'l Colloquium Grammatical Inference: Algorithms and Applications, pp. 103-114, 2000.
[106] V. Balasubramanian, “Equivalence and Reduction of Hidden Markov Models,” Technical Report AITR-1370, Mass. Inst. of Tech nology, 1993.

Index Terms:
Index Terms- Automata, classes defined by grammars or automata, machine learning, language acquisition, language models, language parsing and understanding, machine translation, speech recognition and synthesis, structural pattern recognition, syntactic pattern recognition.
Citation:
Enrique Vidal, Frank Thollard, Colin de la Higuera, Francisco Casacuberta, Rafael C. Carrasco, "Probabilistic Finite-State Machines-Part II," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 7, pp. 1026-1039, July 2005, doi:10.1109/TPAMI.2005.148
Usage of this product signifies your acceptance of the Terms of Use.