This Article 
 Bibliographic References 
 Add to: 
How Bad May Learning Curves Be?
October 2000 (vol. 22 no. 10)
pp. 1155-1167

Abstract—In this paper, we motivate the need for estimating bounds on learning curves of average-case learning algorithms when they perform the worst on training samples. We then apply the method of reducing learning problems to hypothesis testing ones to investigate the learning curves of a so-called ill-disposed learning algorithm in terms of a system complexity, the Boolean interpolation dimension. Since the ill-disposed algorithm behaves worse than ordinal ones, and the Boolean interpolation dimension is generally bounded by the number of system weights, the results can apply to interpreting or to bounding the worst-case learning curve in real learning situations. This study leads to a new understanding of the worst-case generalization in real learning situations, which differs significantly from that in the uniform learnable setting via Vapnik-Chervonenkis (VC) dimension analysis. We illustrate the results with some numerical simulations.

[1] V.N. Vapnik, Estimation of Dependences Based on Empirical Data. New York: Springer-Verlag, 1982.
[2] L.G. Valiant, “A Theory of the Learnable,” Comm. ACM, vol. 27, no. 11, pp. 1134-1142, Nov. 1984.
[3] A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth, "Learnability and the Vapnik-Chervonenkis Dimension," J. ACM, vol. 36, pp. 929-965, 1989.
[4] D. Cohn and G. Tesauro, "How Tight are the Vapnik-Chervonenkis Bounds?" Neural Computation, vol. 4, pp. 249-269, 1992.
[5] D. Haussler, M. Kearns, M. Opper, and R. Schapire, “Bounds on the Sample Complexity of Bayesian Learning Using Information Theory and the VC Dimension,” Proc. Fourth Ann. Workshop Computer Learning Theory, pp. 61-74, 1991.
[6] S. Amari, “A Universal Theorem on Learning Curves,” Neural Networks, vol. 6, pp. 161-166, 1993.
[7] S. Amari, N. Fujita, and S. Shinomoto, "Four Types of Learning Curves," Neural Computation, vol. 4, pp. 605-618, 1992.
[8] N. Tishby, E. Levin, and S. Solla, “Consistent Inference of Probabilities in Layered Networks: Predictions and Generalizations,” Proc. Int'l Joint Conf. Neural Networks, vol. 2, pp. 403-409, 1989.
[9] D.B. Schwartz, V.K. Samalam, S.A. Solla, and J.S. Denker, “Exhaustive Learning,” Neural Computation, vol. 2, pp. 374-385, 1990.
[10] D. Haussler, H.S. Seung, M. Kearns, and N. Tishby, “Rigorous Learning Curve from Statistical Mechanics,” preprints, 1994.
[11] S.B. Holden and M. Niranjan, “On the Practical Applicability of VC Dimension Bounds,” preprints, 1994.
[12] H. Gu and H. Takahashi, “Estimating Learning Curves of Concept Learning,” Neural Networks, vol. 10, no. 6, pp. 1,089-1,102, 1997.
[13] H. Gu and H. Takahashi, “Towards More Practical Average Bounds on Supervised Learning,” IEEE Trans. Neural Networks, vol. 7, pp. 953-968, 1996.
[14] H. Gu and H. Takahashi, “Learning Curves in Learning with Noise—An Empirical Study,” IEICE Trans. Information and Systems, vol. E80-D, no. 1, pp. 78-85, 1997.
[15] H. Gu and H. Takahashi, “Exponential or Polynomial Learning Curves?—Case-Based Studies,” Neural Computation, vol. 12, no. 4, pp. 795-809, 2000.
[16] D. Haussler, “Quantifying Inductive Bias—AI Learning Algorithms and Valiant's Learning Framework,” Artificial Intelligence, vol. 36, pp. 177-221, 1988.
[17] A. Macintyre and E. Sontag, “Finiteness Results for Sigmoidal 'Neural' Networks,” Proc. 25th Ann. Symp. Theory Computing, pp. 325-334, May 1993.
[18] H. Takahashi and H. Gu, “A Tight Bound on Concept Learning,” IEEE Trans. Neural Networks, vol. 9, no. 6, pp. 1,192-1,202, 1998.
[19] R.K. Montoye, E. Hokenek, and S.L. Runyon, "Design of the IBM RISC System/6000 Floating-Point Execution Unit," IBM J. Research and Development, vol. 34, pp. 59-70, Jan. 1990.
[20] E. Wang, “Neural Network Classification: A Bayesian Interpretation,” IEEE Trans. Neural Networks, vol. 1, no. 4, pp. 303-305, 1990.
[21] H. Gu, “Towards a Practically Applicable Theory of Learning Curves in Neural Networks,” doctoral dissertation, The Univ. of Electro-Comm., Tokyo 1997.

Index Terms:
Generalization, concept learning, generalization error, learning curves, sample complexity, PAC learning, worst-case learning, interpolation dimension.
Hanzhong Gu, Haruhisa Takahashi, "How Bad May Learning Curves Be?," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 10, pp. 1155-1167, Oct. 2000, doi:10.1109/34.879795
Usage of this product signifies your acceptance of the Terms of Use.