Proceedings 17th IEEE Annual Conference on Computational Complexity (2002)
May 21, 2002 to May 24, 2002
ISBN: 0-7695-1468-5
pp: 0017
Tugkan Batu , University of Pennsylvania
Sanjoy Dasgupta , AT&T Labs-Research
Ravi Kumar , IBM Almaden Research Center
Ronitt Rubinfeld , NEC Research Institute
ABSTRACT
We consider the problem of approximating the entropy of a discrete distribution under several models. If the distribution is given explicitly as an array where the i-th location is the probability of the i-th element, then linear time is both necessary and sufficient for approximating the entropy.We consider a model in which the algorithm is given access only to independent samples from the distribution. Here, we show that a \gamma-multiplicative approximation to the entropy can be obtained in O\left(n^{(1+\eta)/\gamma^2} \poly(\log n)\right) time for distributions with entropy \Omega(\gamma/\eta), where n is the size of the domain of the distribution and \eta is an arbitrarily small positive constant. We show that one cannot get a multiplicative approximation to the entropy in general in this model. Even for the class of distributions to which our upper bound applies, we obtain a lower bound of \Omega\left(n^{\max(1/(2\gamma^2),2/(5\gamma^2-2))} \right).We next consider a hybrid model in which both the explicit distribution as well as independent samples are available. Here, significantly more efficient algorithms can be achieved: a \gamma-multiplicative approximation to the entropy can be obtained in O \left(\frac{\gamma^2 \log^2{n}}{h^2 (\gamma-1)^2} \right) time for distributions with entropy \Omega(h); we show a lower bound of \Omega \left(\frac{\log n}{h(\gamma^2-1)} \right).Finally, we consider two special families of distributions: those for which the probability of an element decreases monotonically in the label of the element, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.
INDEX TERMS
entropy, entropy approximation, black-box distribution, sample complexity, monotone distribution
CITATION

S. Dasgupta, R. Kumar, R. Rubinfeld and T. Batu, "The Complexity of Approximating the Entropy," Proceedings 17th IEEE Annual Conference on Computational Complexity(CCC), Montreal, Canada, 2002, pp. 0017.
doi:10.1109/CCC.2002.1004329