
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
M. Gori, A. Tesi, "On the Problem of Local Minima in Backpropagation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 1, pp. 7686, January, 1992.  
BibTex  x  
@article{ 10.1109/34.107014, author = {M. Gori and A. Tesi}, title = {On the Problem of Local Minima in Backpropagation}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {14}, number = {1}, issn = {01628828}, year = {1992}, pages = {7686}, doi = {http://doi.ieeecomputersociety.org/10.1109/34.107014}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Pattern Analysis and Machine Intelligence TI  On the Problem of Local Minima in Backpropagation IS  1 SN  01628828 SP76 EP86 EPD  7686 A1  M. Gori, A1  A. Tesi, PY  1992 KW  learning systems; local minima; backpropagation; pattern recognition; network architecture; convergence; multilayered neural networks; perceptrons; learning systems; neural nets; pattern recognition VL  14 JA  IEEE Transactions on Pattern Analysis and Machine Intelligence ER   
The authors propose a theoretical framework for backpropagation (BP) in order to identify some of its limitations as a general learning procedure and the reasons for its success in several experiments on pattern recognition. The first important conclusion is that examples can be found in which BP gets stuck in local minima. A simple example in which BP can get stuck during gradient descent without having learned the entire training set is presented. This example guarantees the existence of a solution with null cost. Some conditions on the network architecture and the learning environment that ensure the convergence of the BP algorithm are proposed. It is proven in particular that the convergence holds if the classes are linearly separable. In this case, the experience gained in several experiments shows that multilayered neural networks (MLNs) exceed perceptrons in generalization to new examples.
[1] H. Bourlard and K. Hornik, "Neural networks and principal component analysis: Learning from examples without local minima,"Neural Networks, vol. 2, pp. 5358, 1988.
[2] R. Bellman,Introduction to Matrix Analysis. New York: McGrawHill, 1960, p. 56.
[3] E. K. Blum, "Approximation of Boolean functions by sigmoidal networks: Part I: XOR and other twovariable functions,"Neural computation, vol. 1, pp. 532540, 1989.
[4] H. Bourlard and C. Wellekens, "Speech pattern discrimination and multilayered perceptrons,"Comput. Speech Language, vol. 3, pp. 119, 1989.
[5] M. L. Brady, R. Raghavan, and J. Slawny, "Backpropagation fails to separate where perceptrons succeed,"IEEE Trans. Circuits Syst., vol. 36, pp. 665674, 1989.
[6] A. E. Bryson and Y. C. Ho,Applied Optimal Control. Waltham, MA: Blaisdell, 1969.
[7] P. Cosi, Y. Bengio, and R. De Mori, "Phoneticallybased multilayered networks for vowel classification,"Speech Commun., vol. 9, no. 1, pp. 1529, Feb. 1990.
[8] G. Cybenko, "Approximation by superpositions of a single sigmoidal function,"Math. Contr. Signal Syst., vol. 3, pp. 303314, 1989.
[9] L. Elman and D. Zipser, "Learning the hidden structure of the speech,"J. Acoust. Soc. Amer., vol. 83, no. 4, 1988.
[10] M. Gori, "Apprendimento con supervisione in reti neuronali,"Ph.D. thesis, Univ. di Bologna, Feb. 1990.
[11] M. Gori, Y. Bengio, and R. De Mori, "BPS: A learning algorithm for capturing the dynamical nature of speech," inProc. IEEEIJCNN(Washington DC), 1989, pp. 417423, vol. II.
[12] M. Gori, G. Soda, and A. Tesi, "Optimal learning in experiments of handwritten character recognition,"DSIRT 32/90, Univ. di Firenze, 1990.
[13] M. Gori and A. Tesi, "Some examples of local minima during learning with backpropagation," inProc. Parallel Architectures Neural Networks(Vietri sul Mare, Italy), May 1990.
[14] H. P. Graf, L. D. Jackel, and W. E. Hubbard, "VLSI implementation of a neural network model,"IEEE Comput., vol. 21, no. 3, Mar. 1988.
[15] S. J. Hanson and D. J. Burr, "Minkowskyr backpropagation: Learning in connectionist models with nonEuclidean error signals," inProc. NIP87(Denver, CO), 1988.
[16] R. HechtNielsen, "Theory of backpropagation neural network," inProc. IEEEIJCNN89(Washinghton DC), 1989, pp. 593605, vol. I.
[17] K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward networks are universal approximators,"Neural Networks, vol. 2, pp. 359366, 1989.
[18] W. Y. Huang and R. P. Lippman, "Neural net and traditional classifiers," inProc. NIP87(Denver, CO), 1988, pp. 387396.
[19] R. A. Jacobs, "Increased rates of convergence through learning rate adaptation,"Neural Networks, vol. 1, pp. 295307, 1988.
[20] Y. le Cun, "Learning processes in an asymmetric threshold network," inDisordered Systems and Biological Organization(F. Soulie, E. Bienenstock, and G. Weisbuch, Eds.). Les Houches, France: SpringerVerlag, 1986, pp. 233340.
[21] Y. le Cun, "Generalization and network design strategies," inProc. Conectionism in Perspective. New York: Elsevier, North Holland, 1989.
[22] Y. le Cun, "A theoretical framework for backpropagation", inProc. 1988 Conectionist Models Summer Sch.(D. Touresky, G. Hinton, and T. Sejnowski, Eds.). San Mateo, CA: Morgan Kauffmann, 1988, pp. 2128.
[23] R. P. Lippman, "An introduction to computing with neural nets,"IEEE ASSP Msg., vol. 4, pp. 422, 1987.
[24] M. L. Minsky and S. A. Papert,Perceptrons, expanded ed. Cambridge, MA: MIT Press, 1988.
[25] F. Rosenblatt,Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanism. Washington DC: Spartan, 1962.
[26] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representation by error propagation,"Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vols. 1 and 2. Cambridge, MA: MIT Press, 1986.
[27] F. J. Pineda, "Generalization of backpropagation to recurrent neural networks,"Phys. Rev. Lett., vol. 59, no. 19, 1987.
[28] T. J. Sejnowsky and C. R. Rosemberg, "Parallel networks that learn to pronounce english text,"Complex Syst., vol. 1, pp. 145168.
[29] E. D. Sontag and H. J. Sussman, "Backpropagation can give to spurious local minima even for networks without hidden layers,Complex Syst., vol. 3, pp. 91106, 1989.
[30] E. D. Sontag and H. J. Sussman, "Backpropagation separates when perceptrons do," inProc. IEEEIJCNN89(Washington DC), 1989, pp. 639642, vol. I.
[31] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, "Phoneme recognition using timedelay neural networks,"IEEE Trans. Acoust. Speech Signal Processing, vol. 37, no. 3, 1989.
[32] R. L. Watrous, "Learning algorithm for connectionist networks: Applied gradient methods for nonlinear optimization," inProc. First IEEE Int. Conf. Neural Networks(San Diego, CA), 1987, pp. 619627.
[33] P. J. Werbos, "Backpropagation: Past and future," inProc. IEEE Int. Conf. Neural Networks. New York: IEEE Press, 1988, pp. 343353.
[34] B. Widrow and M. A. Lehr, "30 years of adaptive neural networks: Perceptron, madaline, and backpropagation,"Proc. IEEE, vol. 78, no. 9, pp. 14151442, 1990.
[35] S. F. Zornetzer, J. L. Davis, and C. Lau, Eds.,An Introduction to Neural and Electronic Networks. New York: Academic, 1990.