This Article 
 Bibliographic References 
 Add to: 
Optimal Probabilistic Evaluation Functions for Search Controlled by Stochastic Context-Free Grammars
October 1994 (vol. 16 no. 10)
pp. 1018-1027

The possibility of using stochastic context-free grammars (SCFG's) in language modeling (LM) has been considered previously. When these grammars are used, search can be directed by evaluation functions based on the probabilities that a SCFG generates a sentence, given only some words in it. Expressions for computing the evaluation function have been proposed by Jelinek and Lafferty (1991) for the recognition of word sequences in the case in which only the prefix of a sequence is known. Corazza et al. (1991) have proposed methods for probability computation in the more general case in which partial word sequences interleaved by gaps are known. This computation is too complex in practice unless the lengths of the gaps are known. This paper proposes a method for computing the probability of the best parse tree that can generate a sentence only part of which (consisting of islands and gaps) is known. This probability is the minimum possible, and thus the most informative, upper-bound that can be used in the evaluation function. The computation of the proposed upper-bound has cubic time complexity even if the lengths of the gaps are unknown. This makes possible the practical use of SCFG for driving interpretations of sentences in natural language processing.

[1] F. Jelinek and J. D. Lafferty, "Computation of the probability of initial substring generation by stochastic context free grammars,"Computat. Linguist., vol. 17, no. 3, pp. 315-323, 1991.
[2] A. Corazza, R. De Mori, R. Gretter, and G. Satta, "Computation of probabilities for a stochastic island-driven parser,"IEEE Trans. Pattern Anal. Machine Intell., vol. 13, no. 9, pp. 936-950, 1991.
[3] W. A. Woods, "Optimal search strategies for speech understanding control,"Artificial Intell., vol. 18, no. 3, pp. 295-326, 1982.
[4] F. Jelinek, J. D. Lafferty, and R. L. Mercer, "Basic methods of probabilistic context free grammars," inSpeech Recognition and Understanding, R. D. Mori and P. Laface, Eds. Berlin, Germany: Springer-Verlag, 1992.
[5] R. C. Gonzales and M. G. Thomason,Syntactic Pattern Recognition. Reading, MA: Addison-Wesley, 1978.
[6] C. S. Wetherell, "Probabilistic languages: A review and some open questions,"Comput. Surveys, vol. 12, pp. 361-379, 1980.
[7] M. A. Harrison,Introduction to Formal Language Theory. Reading, MA: Addison-Wesley, 1978.
[8] D. H. Younger, "Recognition and parsing of context-free languages in timen3,"Inform. Contr., vol. 10, pp. 189-208, 1967.
[9] A. V. Aho and J. D. Ullman,The Theory of Parsing, Translation, and Compiling, Vol. 1: Parsing. Englewood Cliffs, NJ: Prentice-Hall, 1972.
[10] K. S. Fu,Syntactic Pattern Recognition and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1982.
[11] A. Salomaa, "Probabilistic and weighted grammars,"Inform Contr., vol. 15, pp. 529-544, 1969.
[12] H. C. Lee and K. S. Fu, "A stochastic syntax analysis procedure and its application to pattern classification,"IEEE Trans. Comput., vol. C-4, no. 3, pp. 660-666, 1972.
[13] E. Persoon and K. S. Fu, "Sequential classification of strings generated by stochastic context-free grammars,"Int. J. Comput. Inform. Sci., vol. 4, no. 3, pp. 205-218, 1975.
[14] S. Y. Lu and K. S. Fu, "Stochastic error-correcting syntax analysis for recognition of noisy patterns,"IEEE Trans. Comput., vol. C-26, no. 12, pp. 1268-1276, 1977.
[15] R. Gretter, "Upper-bounds for theories composed by islands and gaps and generated by stochastic context-free grammars--General case," manuscript, IRST, Trento, Italy, 1993.
[16] E. Tanaka and K. S. Fu, "Error-correcting parsers for formal languages,"IEEE Trans. Comput., vol. C-27, no. 7, pp. 605-616, 1978.

Index Terms:
context-sensitive grammars; computational complexity; natural languages; probability; optimal probabilistic evaluation functions; search; stochastic context-free grammars; language modeling; probabilities; word sequences; probability computation; partial word sequences; best parse tree; cubic time complexity; natural language processing
A. Corazza, R. De Mori, R. Gretter, G. Satta, "Optimal Probabilistic Evaluation Functions for Search Controlled by Stochastic Context-Free Grammars," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 10, pp. 1018-1027, Oct. 1994, doi:10.1109/34.329008
Usage of this product signifies your acceptance of the Terms of Use.