This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Ninth International Workshop on Frontiers in Handwriting Recognition (IWFHR'04)
Stability Measure of Entropy Estimate and Its Application to Language Model Evaluation
Kokubunji, Tokyo, Japan
October 26-October 29
ISBN: 0-7695-2187-8
Jahwan Kim, Korea Advanced Institute of Science Technology
Sungho Ryu, Korea Advanced Institute of Science Technology
Jin H. Kim, Korea Advanced Institute of Science Technology
We propose in this paper a stability measure of entropy estimate based on the principle of Bayesian statistics. Stability, or how the estimates vary as training set does, is a critical issue especially for the problems where parameter-to-data ratio is extremely high as in language modeling and text compression. There are two natural estimates of entropy, one being the classical estimate and the other the Bayesian estimate. We show that the difference of them is in strong positive correlation with the variance of the classical estimate when it is not so small, and propose this difference as stability measure of entropy estimate. In order to evaluate it for language models where estimates are available but posterior distribution is not in general, we suggest to use a Dirichlet distribution so that its expectation agrees with the estimated parameters and that the total count is preserved at the same time. Experiments on two benchmark corpora show that the proposed measure indeed reflects the stability of classical entropy estimates.
Citation:
Jahwan Kim, Sungho Ryu, Jin H. Kim, "Stability Measure of Entropy Estimate and Its Application to Language Model Evaluation," iwfhr, pp.456-461, Ninth International Workshop on Frontiers in Handwriting Recognition (IWFHR'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.