Ninth International Workshop on Frontiers in Handwriting Recognition (2004)

Kokubunji, Tokyo, Japan

Oct. 26, 2004 to Oct. 29, 2004

ISSN: 1550-5235

ISBN: 0-7695-2187-8

pp: 456-461

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IWFHR.2004.98

Jahwan Kim , Korea Advanced Institute of Science Technology

Jin H. Kim , Korea Advanced Institute of Science Technology

Sungho Ryu , Korea Advanced Institute of Science Technology

ABSTRACT

We propose in this paper a stability measure of entropy estimate based on the principle of Bayesian statistics. Stability, or how the estimates vary as training set does, is a critical issue especially for the problems where parameter-to-data ratio is extremely high as in language modeling and text compression. There are two natural estimates of entropy, one being the classical estimate and the other the Bayesian estimate. We show that the difference of them is in strong positive correlation with the variance of the classical estimate when it is not so small, and propose this difference as stability measure of entropy estimate. In order to evaluate it for language models where estimates are available but posterior distribution is not in general, we suggest to use a Dirichlet distribution so that its expectation agrees with the estimated parameters and that the total count is preserved at the same time. Experiments on two benchmark corpora show that the proposed measure indeed reflects the stability of classical entropy estimates.

INDEX TERMS

null

CITATION

Jahwan Kim,
Jin H. Kim,
Sungho Ryu,
"Stability Measure of Entropy Estimate and Its Application to Language Model Evaluation",

*Ninth International Workshop on Frontiers in Handwriting Recognition*, vol. 00, no. , pp. 456-461, 2004, doi:10.1109/IWFHR.2004.98