This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Probabilistic Context-Free Grammars Estimated from Infinite Distributions
August 2007 (vol. 29 no. 8)
pp. 1379-1393
Giorgio Satta, IEEE Computer Society
In this paper, we consider probabilistic context-free grammars, a class of generative devices that has been successfully exploited in several applications of syntactic pattern matching, especially in statistical natural language parsing. We investigate the problem of training probabilistic context-free grammars on the basis of distributions defined over an infinite set of trees or an infinite set of sentences by minimizing the cross-entropy. This problem has applications in cases of context-free approximation of distributions generated by more expressive statistical models. We show several interesting theoretical properties of probabilistic context-free grammars that are estimated in this way, including the previously unknown equivalence between the grammar cross-entropy with the input distribution and the so-called derivational entropy of the grammar itself. We discuss important consequences of these results involving the standard application of the maximum-likelihood estimator on finite tree and sentence samples, as well as other finite-state models such as Hidden Markov Models and probabilistic finite automata.
Index Terms:
Probabilistic context-free grammars, maximum-likelihood estimation, derivational entropy, cross-entropy, expectation-maximization methods, Hidden Markov Models.
Citation:
Anna Corazza, Giorgio Satta, "Probabilistic Context-Free Grammars Estimated from Infinite Distributions," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 8, pp. 1379-1393, Aug. 2007, doi:10.1109/TPAMI.2007.1065
Usage of this product signifies your acceptance of the Terms of Use.