The Community for Technology Leaders
RSS Icon
Subscribe
Kokubunji, Tokyo, Japan
Oct. 26, 2004 to Oct. 29, 2004
ISBN: 0-7695-2187-8
pp: 456-461
Jahwan Kim , Korea Advanced Institute of Science Technology
Sungho Ryu , Korea Advanced Institute of Science Technology
Jin H. Kim , Korea Advanced Institute of Science Technology
ABSTRACT
We propose in this paper a stability measure of entropy estimate based on the principle of Bayesian statistics. Stability, or how the estimates vary as training set does, is a critical issue especially for the problems where parameter-to-data ratio is extremely high as in language modeling and text compression. There are two natural estimates of entropy, one being the classical estimate and the other the Bayesian estimate. We show that the difference of them is in strong positive correlation with the variance of the classical estimate when it is not so small, and propose this difference as stability measure of entropy estimate. In order to evaluate it for language models where estimates are available but posterior distribution is not in general, we suggest to use a Dirichlet distribution so that its expectation agrees with the estimated parameters and that the total count is preserved at the same time. Experiments on two benchmark corpora show that the proposed measure indeed reflects the stability of classical entropy estimates.
INDEX TERMS
null
CITATION
Jahwan Kim, Sungho Ryu, Jin H. Kim, "Stability Measure of Entropy Estimate and Its Application to Language Model Evaluation", IWFHR, 2004, Proceedings. Ninth International Workshop on Frontiers in Handwriting Recognition, Proceedings. Ninth International Workshop on Frontiers in Handwriting Recognition 2004, pp. 456-461, doi:10.1109/IWFHR.2004.98
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool