This Article 
 Bibliographic References 
 Add to: 
On the Problem of Local Minima in Backpropagation
January 1992 (vol. 14 no. 1)
pp. 76-86

The authors propose a theoretical framework for backpropagation (BP) in order to identify some of its limitations as a general learning procedure and the reasons for its success in several experiments on pattern recognition. The first important conclusion is that examples can be found in which BP gets stuck in local minima. A simple example in which BP can get stuck during gradient descent without having learned the entire training set is presented. This example guarantees the existence of a solution with null cost. Some conditions on the network architecture and the learning environment that ensure the convergence of the BP algorithm are proposed. It is proven in particular that the convergence holds if the classes are linearly separable. In this case, the experience gained in several experiments shows that multilayered neural networks (MLNs) exceed perceptrons in generalization to new examples.

[1] H. Bourlard and K. Hornik, "Neural networks and principal component analysis: Learning from examples without local minima,"Neural Networks, vol. 2, pp. 53-58, 1988.
[2] R. Bellman,Introduction to Matrix Analysis. New York: McGraw-Hill, 1960, p. 56.
[3] E. K. Blum, "Approximation of Boolean functions by sigmoidal networks: Part I: XOR and other two-variable functions,"Neural computation, vol. 1, pp. 532-540, 1989.
[4] H. Bourlard and C. Wellekens, "Speech pattern discrimination and multi-layered perceptrons,"Comput. Speech Language, vol. 3, pp. 1-19, 1989.
[5] M. L. Brady, R. Raghavan, and J. Slawny, "Back-propagation fails to separate where perceptrons succeed,"IEEE Trans. Circuits Syst., vol. 36, pp. 665-674, 1989.
[6] A. E. Bryson and Y. C. Ho,Applied Optimal Control. Waltham, MA: Blaisdell, 1969.
[7] P. Cosi, Y. Bengio, and R. De Mori, "Phonetically-based multi-layered networks for vowel classification,"Speech Commun., vol. 9, no. 1, pp. 15-29, Feb. 1990.
[8] G. Cybenko, "Approximation by superpositions of a single sigmoidal function,"Math. Contr. Signal Syst., vol. 3, pp. 303-314, 1989.
[9] L. Elman and D. Zipser, "Learning the hidden structure of the speech,"J. Acoust. Soc. Amer., vol. 83, no. 4, 1988.
[10] M. Gori, "Apprendimento con supervisione in reti neuronali,"Ph.D. thesis, Univ. di Bologna, Feb. 1990.
[11] M. Gori, Y. Bengio, and R. De Mori, "BPS: A learning algorithm for capturing the dynamical nature of speech," inProc. IEEE-IJCNN(Washington DC), 1989, pp. 417-423, vol. II.
[12] M. Gori, G. Soda, and A. Tesi, "Optimal learning in experiments of handwritten character recognition,"DSI-RT 32/90, Univ. di Firenze, 1990.
[13] M. Gori and A. Tesi, "Some examples of local minima during learning with backpropagation," inProc. Parallel Architectures Neural Networks(Vietri sul Mare, Italy), May 1990.
[14] H. P. Graf, L. D. Jackel, and W. E. Hubbard, "VLSI implementation of a neural network model,"IEEE Comput., vol. 21, no. 3, Mar. 1988.
[15] S. J. Hanson and D. J. Burr, "Minkowsky-r backpropagation: Learning in connectionist models with non-Euclidean error signals," inProc. NIP87(Denver, CO), 1988.
[16] R. Hecht-Nielsen, "Theory of backpropagation neural network," inProc. IEEE-IJCNN89(Washinghton DC), 1989, pp. 593-605, vol. I.
[17] K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward networks are universal approximators,"Neural Networks, vol. 2, pp. 359-366, 1989.
[18] W. Y. Huang and R. P. Lippman, "Neural net and traditional classifiers," inProc. NIP87(Denver, CO), 1988, pp. 387-396.
[19] R. A. Jacobs, "Increased rates of convergence through learning rate adaptation,"Neural Networks, vol. 1, pp. 295-307, 1988.
[20] Y. le Cun, "Learning processes in an asymmetric threshold network," inDisordered Systems and Biological Organization(F. Soulie, E. Bienenstock, and G. Weisbuch, Eds.). Les Houches, France: Springer-Verlag, 1986, pp. 233-340.
[21] Y. le Cun, "Generalization and network design strategies," inProc. Conectionism in Perspective. New York: Elsevier, North Holland, 1989.
[22] Y. le Cun, "A theoretical framework for backpropagation", inProc. 1988 Conectionist Models Summer Sch.(D. Touresky, G. Hinton, and T. Sejnowski, Eds.). San Mateo, CA: Morgan Kauffmann, 1988, pp. 21-28.
[23] R. P. Lippman, "An introduction to computing with neural nets,"IEEE ASSP Msg., vol. 4, pp. 4-22, 1987.
[24] M. L. Minsky and S. A. Papert,Perceptrons, expanded ed. Cambridge, MA: MIT Press, 1988.
[25] F. Rosenblatt,Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanism. Washington DC: Spartan, 1962.
[26] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representation by error propagation,"Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vols. 1 and 2. Cambridge, MA: MIT Press, 1986.
[27] F. J. Pineda, "Generalization of back-propagation to recurrent neural networks,"Phys. Rev. Lett., vol. 59, no. 19, 1987.
[28] T. J. Sejnowsky and C. R. Rosemberg, "Parallel networks that learn to pronounce english text,"Complex Syst., vol. 1, pp. 145-168.
[29] E. D. Sontag and H. J. Sussman, "Backpropagation can give to spurious local minima even for networks without hidden layers,Complex Syst., vol. 3, pp. 91-106, 1989.
[30] E. D. Sontag and H. J. Sussman, "Backpropagation separates when perceptrons do," inProc. IEEE-IJCNN89(Washington DC), 1989, pp. 639-642, vol. I.
[31] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, "Phoneme recognition using time-delay neural networks,"IEEE Trans. Acoust. Speech Signal Processing, vol. 37, no. 3, 1989.
[32] R. L. Watrous, "Learning algorithm for connectionist networks: Applied gradient methods for nonlinear optimization," inProc. First IEEE Int. Conf. Neural Networks(San Diego, CA), 1987, pp. 619-627.
[33] P. J. Werbos, "Back-propagation: Past and future," inProc. IEEE Int. Conf. Neural Networks. New York: IEEE Press, 1988, pp. 343-353.
[34] B. Widrow and M. A. Lehr, "30 years of adaptive neural networks: Perceptron, madaline, and backpropagation,"Proc. IEEE, vol. 78, no. 9, pp. 1415-1442, 1990.
[35] S. F. Zornetzer, J. L. Davis, and C. Lau, Eds.,An Introduction to Neural and Electronic Networks. New York: Academic, 1990.

Index Terms:
learning systems; local minima; backpropagation; pattern recognition; network architecture; convergence; multilayered neural networks; perceptrons; learning systems; neural nets; pattern recognition
M. Gori, A. Tesi, "On the Problem of Local Minima in Backpropagation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 1, pp. 76-86, Jan. 1992, doi:10.1109/34.107014
Usage of this product signifies your acceptance of the Terms of Use.