This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Advances in Feedforward Neural Networks: Demystifying Knowledge Acquiring Black Boxes
April 1996 (vol. 8 no. 2)
pp. 211-226

Abstract—We survey research of recent years on the supervised training of feedforward neural networks. The goal is to expose how the networks work, how to engineer them so they can learn data with less extraneous noise, how to train them efficiently, and how to assure that the training is valid. The scope covers gradient descent and polynomial line search, from backpropagation through conjugate gradients and quasi-Newton methods. There is a consensus among researchers that adaptive step gains (learning rates) can stabilize and accelerate convergence and that a good starting weight set improves both the training speed and the learning quality. The training problem includes both the design of a network function and the fitting of the function to a set of input and output data points by computing a set of coefficient weights. The form of the function can be adjusted by adjoining new neurons and pruning existing ones and setting other parameters such as biases and exponential rates. Our exposition reveals several useful results that are readily implementable.

[1] E. Barnard and D. Casasent, "Image Processing for Image Understanding With Neural Networks," Proc. IEEE/INNS Int'l Joint Conf. Neural Networks, vol. 1, pp. 111-115,Washington, D.C., 1989.
[2] E.B. Baum and D. Haussler, “What Size Net Gives Valid Generalization?” Neural Computation, vol. 1, pp. 151-160, 1989.
[3] N.K. Bose and A.K. Garga, "Neural Network Design Using Voronoi Diagrams," IEEE Trans. Neural Networks, vol. 4, no. 5, pp. 778-787, 1993.
[4] N.K. Bose and A.K. Garga, "Neural Network Design Using Voronoi Diagrams: Preliminaries," Int'l Joint IEEE/INNS Conf. Neural Networks, vol. 3, pp. 127-132,Baltimore, 1992.
[5] A.N. Burkitt, "Optimization of the Architecture of Feedforward Neural Networks With Hidden Layers by Unit Elimination," Complex Syst., vol. 5, pp. 371-380, 1991.
[6] G. Castellano, A.M. Fanelli, and M. Pelillo, "An Empirical Comparison of Node Pruning Methods for Layered Feed-Forward Neural Networks," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 321-330,Nagoya, Japan, 1993.
[7] J.P. Cater, "Successfully Using Peak Learning Rates of 10 (and Greater) in Backpropagation Networks With the Heuristic Learning Rule," Proc. First IEEE Int'l Conf. Neural Networks, vol. 2, pp. 645-651,San Diego, 1987.
[8] Y. Chauvin, "A Back-Propagation Algorithm With Optimal Use of Hidden Units," D.S. Touretzky, ed., Advances in Neural Information Processing 1.San Mateo, Calif.: Morgan Kaufmann, pp. 519-526, 1989.
[9] D. Chester, "Why Two Hidden Layers Are Better Than One," Proc. IEEE Joint Int'l Conf. Neural Networks, vol. 1, pp. 265-268,Washington, D.C., 1990.
[10] E.D. Dahl, "Accelerated Learning Using the Generalized Delta Rule," Proc. First IEEE Int'l Conf. Neural Networks, vol. 2, pp. 523-530,San Diego, 1987.
[11] J. de Villiers and E. Barnard, "Backpropagation Neural Nets with One and Two Hidden Layers," IEEE Trans. Neural Networks, Vol. 4, No. 1, 1992, pp.136-141.
[12] H.A.C. Eaton and T.L. Olivier, "Learning Coefficient Dependence on Training Set Size," Neural Networks, vol. 5, pp. 283-288, 1992.
[13] S.E. Fahlman and C. Lebiere, "The Cascade-Correlation Learning Architecture," Tech. Report CMU-CS-90-100, Carnegie Mellon Univ., 1990.
[14] S.E. Fahlman, "An Empirical Study of Learning Speed in Backpropagation," Tech. Report CMU-CS-88-162, Carnegie Mellon Univ., 1988.
[15] L. Fausett, Fundamentals of Neural Networks.Englewood Cliffs, N.J.: Prentice Hall, 1994.
[16] R. Fletcher and C.M. Reeves, "Function Minimization by Conjugate Gradients," Computer J., vol. 7, pp. 149-154, 1964.
[17] R. Fletcher, Practical Methods of Optimization. John Wiley and Sons, second ed., 1987.
[18] P.E. Gill, W. Murray, and M.H. Wright, Practical Optimization.New York: Academic Press, 1981.
[19] M. Gutierrez, J. Wang, and R. Grondin, "Estimating Hidden Unit Number for Two-Layer Perceptrons," Proc. IEEE/INNS Int'l Joint Conf. Neural Networks, vol. 1, pp. 677-681,Washington, D.C., 1989.
[20] M. Hagiwara, "Removal of Hidden Units and Weights for Backpropagation Networks," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 351-354,Nagoya, Japan, 1993.
[21] M. Hayashi, "A Fast Algorithm for the Hidden Units in a Multilayer Perceptron," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 339-342,Nagoya, Japan, 1993.
[22] M.R. Hestenes and E.L. Stiefel, "Methods of Conjugate Gradients for Solving Linear Systems," J. Research Nat'l Bureau of Standards, vol. 49, no. 6, pp. 409-436, 1952.
[23] G.E. Hinton, “Connectionist Learning Procedures,” Artificial Intelligence, vol. 40, pp. 185–234, 1989.
[24] M. Hoehfeld and S.E. Fahlman, "Learning With Limited Numerical Precision Using Cascade Correlation Algorithm," IEEE Trans. Neural Networks, vol. 3, no. 4, pp. 602-611, 1992.
[25] K. Hornik, M. Stinchcombe, and H. White, “Multilayer Feedforward Networks are Universal Approximations,” Neural Networks, vol. 2, pp. 359-366, 1989.
[26] S.C. Huang and Y.F. Huang, "Bounds on the Number of Hidden Neurons in Multilayer Perceptrons," IEEE Trans. Neural Networks, vol. 2, no. 1, pp. 47-55, 1991.
[27] F.K. Hwang, “Comments on Reliable Loop Topologies for Large Local Computer Networks,” IEEE Trans. Computers, vol. 36, no. 3, pp. 383-384, Mar. 1987.
[28] R.A. Jacobs, "Increased Rates of Convergence Through Learning Rate Adaptation," Neural Networks, vol. 1, no. 4, pp. 295-307, 1988.
[29] E.M. Johansson, F.U. Dowla, and D.M. Goodman, "Backpropagation Learning for Multilayer Feedforward Neural Networks Using the Conjugate Gradient Method," Int'l J. Neural Systems, vol. 2, no. 4, pp. 291-301, 1992.
[30] S. Judd, "Learning in Networks Is Hard," Proc. First IEEE Int'l Conf. Neural Networks, vol. 2, pp. 685-692,San Diego, 1987.
[31] E.D. Karnin,“A simple procedure for pruning back-propagation trained neural networks,” IEEE Trans. Neural Networks, vol. 1, no. 2, pp. 239-242, June 1990.
[32] J.K. Kruschke, "Improving Generalization in Backpropagation Networks With Distributed Bottlenecks," Proc. IEEE/INNS Int'l Joint Conf. Neural Networks, vol 1, pp. 443-447,Washington, D.C., 1989.
[33] S.Y. Kung, K. Diamantaras, W.D. Mao, and J.S. Taur, "Generalized Perceptron Networks With Nonlinear Discriminant Functions," R.J. Mammone and Y.Y. Zeevi, eds., Neural Networks Theory and Applications.New York: Academic Press, pp. 245-279, 1991.
[34] K. D. Wagner, C. K. Chin, and E. J. McCluskey,“Pseudorandom testing,”IEEE Trans. Comput., vol. C-36, pp. 332–343, Mar. 1987.
[35] A. Lapedes, "How Neural Networks Work," Neural Info. Proc. Sys., pp. 442-456, 1988.
[36] B. Gudmundsson and M. Randen, “Incremental Generation of Projections of CT-Volumes,” Proc. First Conf. Visualization and Biomedical Computing, IEEE Press, Piscataway, N.J., 1990, pp. 27-34.
[37] M. Levene and G. Loizou,“Semantics for null extended nested relations,” ACM Trans. Database Systems, vol. 18, no. 3, pp. 414-459, 1993.
[38] G. Li, H. Alnuweiri, and W. Wu, "Acceleration of Backpropagation Through Initial Weight Pre-Training With Delta Rule," Proc. IEEE Int'l Conf. Neural Networks, vol. 1, pp. 580-585,San Francisco, 1993.
[39] C.G. Looney, "Stabilization and Speedup of Convergence in Training Feedforward Neural Networks," Neurocomputing 10, pp. 7-31, 1996.
[40] C.G. Looney, "Neural Networks As Expert Systems," J. Expert Systems With Applications, vol. 6, no. 2, pp. 129-136, 1993.
[41] J. Makhoul, A. El-Jaroudi, and R. Schwartz, "Formation of Disconnected Decision Regions With a Single Hidden Layer," Proc. IEEE/INNS Int'l Joint Conf. Neural Networks, vol. 1, pp. 455-460,Washington, D.C., 1989.
[42] A. Masahiko, "Mapping Abilities of Three-Layer Neural Networks," Proc. IEEE/INNS Int'l Joint Conf. Neural Networks, vol. 1, pp. 419-423,Washington, D.C., 1989.
[43] C. McCormack and J. Doherty, "Neural Network Super Architectures," Proc. Int'l Joint Conf. Networks, vol. 1, pp. 301-304,Nagoya, Japan, 1993.
[44] K.G. Mehrotra, C.K. Mohan, and S. Ranka, "Bounds on the Number of Samples Needed for Neural Learning," IEEE Trans. Neural Networks, vol. 2, no. 6, pp. 548-558, 1991.
[45] M.L. Minsky and S.A. Papert, Perceptrons.Cambridge, Mass.: MIT Press, 1988.
[46] K.V. Mital, Optimization Methods.New York: Halstead Press, 1976.
[47] M.C. Mozer and P. Smolensky, "Skeletonization: A Technique for Trimming the Fat From a Network Via Relevance Assessment," D.S. Touretzky, ed., Advances in Neural Information Processing 1.San Mateo, Calif.: Morgan Kaufmann, pp. 107-115, 1989.
[48] D. Nguyen and B. Widrow, "Improving the Learning Speed of Two-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights," Proc. IEEE Int'l Joint Conf. Neural Networks, vol. 3, pp. 21-26,San Diego, 1990.
[49] D.B. Parker, "Learning Logic," Technical Report TR-47, MIT Center for Research in Computational Economics and Management Science, Cambridge, Mass., 1985.
[50] A.G. Parlos, B. Fernandez, A.F. Atiya, J. Muthusami, and W.K. Tsai, "An Accelerated Learning Algorithm for Multilayer Perceptron Networks," IEEE Trans. Neural Networks, vol. 5, no. 3, pp. 493-497, 1994.
[51] M. Pelillo and A.M. Fanelli, "A Method of Pruning Layered Feed-Forward Neural Networks," Proc. IWANN (Sitges, Barcelona). Berlin: Springer-Verlag, 1993.
[52] D.C. Plaut, S.J. Nowlan, and G.E. Hinton, "Experiments on Learning by Backpropagation," Technical Report CMU-CS-86-126, Carnegie Mellon Univ., Pittsburgh, Pa., 1986.
[53] H.L. Poh, "A Neural Network Approach for Marketing Strategies Research and Decision Support," PhD thesis, Stanford Univ., 1991.
[54] E. Polak and G. Ribière, "Note sur la convergence de methods de directions conjures," Revue Francais Information Recherche Operationnelle, vol. 16, pp. 35-43, 1969.
[55] R. Reed, "Pruning algorithms—A survey," IEEE Trans. Neural Networks, vol. 4, no. 5, pp. 740-747, Sept. 1993.
[56] H. Braun and M. Riedmiller, "Direct Adaptive Method for Faster Backpropagation Learning: The RPropAlgorithm," Proc. IEEE Int'l Conf. Neural Networks (ICNN '93), IEEE, Piscataway, N.J., 1993, pp. 586-591
[57] H. Robbins and S. Monro, "A Stochastic Approximation Method," Annals Math. Statistics, vol. 22, pp. 400-407, 1951.
[58] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, "Learning Internal Representations by Error Propagation," Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations, D.E. Rumelhart and J.L. McClelland et al., eds., chapter 8, pp. 318-362.Cambridge, Mass.: MIT Press, 1986.
[59] W. Schmidt, S. Raudys, M. Kraaijveld, M. Skurikhina, and R. Duin, "Initializations, Backpropagation and Generalization of Feed-Forward Classifiers," Proc. IEEE Int'l Conf. Neural Networks, vol. 1, pp. 598-604, 1993.
[60] R. Setioni and L.C.K. Hui, "Some n-Bit Parity Problems Are Solvable by Feedforward Networks With Less Than n Hidden Units," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 305-308,Nagoya, Japan, 1993.
[61] D.F. Shanno, "Recent Advances in Numerical Techniques for Large-Scale Optimization," Neural Networks for Control.Cambridge, Mass.: MIT Press, 1990.
[62] Y.L. Shea and C.G. Looney, "Two-Stage Random Optimization of Neural Networks With Sensitivity Based Pruning of Weights," Proc. Golden West Int'l Conf. Intelligent Systems,Reno, Nev., pp. 18-23, 1992.
[63] D.H. Bailey,“Vector computer memory bank contention,” IEEE Trans. Computers, vol. 36, pp. 293-298, 1987.
[64] W.S. Stornetta and B.A. Huberman, "An Improved Three-Layer Backpropagation Algorithm," Proc. First IEEE Int'l Conf. Neural Networks, vol. 2, pp. 645-651,San Diego, 1987.
[65] P.D. Wasserman, Advanced Methods in Neural Computing, Van Nostrand Reinhold, New York, 1993.
[66] R.L. Watrous, "Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization," Proc. First IEEE Int'l Conf. Neural Networks, vol. 2, pp. 619-627,San Diego, 1987.
[67] A.S. Weigend, D.E. Rumelhart, and B.A. Huberman, "Generalization by Weight-Elimination Applied to Currency Exchange Rate Prediction," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 837-841,Seattle, 1991.
[68] A.S. Weigend, B.A. Huberman, and D.E. Rumelhart, "Predicting the Future: A Connectionist Approach," Stanford PDP Research Group Report 90-01, 1990.
[69] A.S. Weigend, D.R. Rumelhart, and B.A. Huberman, "Backpropagation, Weight Elimination, and Time Series Prediction," Proc. Connectionist Models Summer School, pp. 65-80, 1990.
[70] A. Wieland and R. Leighton, "Geometric Analysis of Neural Network Capabilities," Proc. First IEEE Int'l Conf. Neural Networks, vol. 3, pp. 385-392,San Diego, 1987.
[71] P. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, doctoral dissertation, Harvard Univ., Cambridge, Mass., 1974. Reprinted as P. Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting, John Wiley&Sons, New York, 1994.
[72] D. Whitley and C. Bogart, "The Evolution of Connectivity: Pruning Neural Networks Using Genetic Algorithms," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 134-138,Washington, D.C., 1990.
[73] K. Yamada, H. Kami, and J. Tsukomo, "Handwritten Numeral Recognition by Multilayered Neural Network With Improved Learning Algorithm," IEEE Int'l Joint Conf. Neural Networks, vol. 2, pp. 259-266,Washington, D.C., 1989.
[74] X. Yu, N. Loh, and W. Miller, "A New Acceleration Technique forthe Backpropagation Algorithm," Proc. IEEE Int'l Conf. Neural Networks, vol. 3, pp. 1,157-1,161,San Francisco, 1993.
[75] J.M. Zurada, Introduction to Artificial Neural Systems. West Publishing Company, 1992.
[76] M. Zurada, "Lambda Learning Rule for Feedforward Neural Networks," Proc. IEEE Int'l Conf. Neural Networks, vol. 3, pp. 1,808-1,811, 1993.

Index Terms:
Feedforward neural networks, multilayered perceptrons, architecture, training, backpropagation, adaptive learning rate, pattern recognition.
Citation:
Carl G. Looney, "Advances in Feedforward Neural Networks: Demystifying Knowledge Acquiring Black Boxes," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 2, pp. 211-226, April 1996, doi:10.1109/69.494162
Usage of this product signifies your acceptance of the Terms of Use.