
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Carl G. Looney, "Advances in Feedforward Neural Networks: Demystifying Knowledge Acquiring Black Boxes," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 2, pp. 211226, April, 1996.  
BibTex  x  
@article{ 10.1109/69.494162, author = {Carl G. Looney}, title = {Advances in Feedforward Neural Networks: Demystifying Knowledge Acquiring Black Boxes}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {8}, number = {2}, issn = {10414347}, year = {1996}, pages = {211226}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.494162}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Advances in Feedforward Neural Networks: Demystifying Knowledge Acquiring Black Boxes IS  2 SN  10414347 SP211 EP226 EPD  211226 A1  Carl G. Looney, PY  1996 KW  Feedforward neural networks KW  multilayered perceptrons KW  architecture KW  training KW  backpropagation KW  adaptive learning rate KW  pattern recognition. VL  8 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—We survey research of recent years on the supervised training of feedforward neural networks. The goal is to expose how the networks work, how to engineer them so they can learn data with less extraneous noise, how to train them efficiently, and how to assure that the training is valid. The scope covers gradient descent and polynomial line search, from backpropagation through conjugate gradients and quasiNewton methods. There is a consensus among researchers that adaptive step gains (learning rates) can stabilize and accelerate convergence and that a good starting weight set improves both the training speed and the learning quality. The training problem includes both the design of a network function and the fitting of the function to a set of input and output data points by computing a set of coefficient weights. The form of the function can be adjusted by adjoining new neurons and pruning existing ones and setting other parameters such as biases and exponential rates. Our exposition reveals several useful results that are readily implementable.
[1] E. Barnard and D. Casasent, "Image Processing for Image Understanding With Neural Networks," Proc. IEEE/INNS Int'l Joint Conf. Neural Networks, vol. 1, pp. 111115,Washington, D.C., 1989.
[2] E.B. Baum and D. Haussler, “What Size Net Gives Valid Generalization?” Neural Computation, vol. 1, pp. 151160, 1989.
[3] N.K. Bose and A.K. Garga, "Neural Network Design Using Voronoi Diagrams," IEEE Trans. Neural Networks, vol. 4, no. 5, pp. 778787, 1993.
[4] N.K. Bose and A.K. Garga, "Neural Network Design Using Voronoi Diagrams: Preliminaries," Int'l Joint IEEE/INNS Conf. Neural Networks, vol. 3, pp. 127132,Baltimore, 1992.
[5] A.N. Burkitt, "Optimization of the Architecture of Feedforward Neural Networks With Hidden Layers by Unit Elimination," Complex Syst., vol. 5, pp. 371380, 1991.
[6] G. Castellano, A.M. Fanelli, and M. Pelillo, "An Empirical Comparison of Node Pruning Methods for Layered FeedForward Neural Networks," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 321330,Nagoya, Japan, 1993.
[7] J.P. Cater, "Successfully Using Peak Learning Rates of 10 (and Greater) in Backpropagation Networks With the Heuristic Learning Rule," Proc. First IEEE Int'l Conf. Neural Networks, vol. 2, pp. 645651,San Diego, 1987.
[8] Y. Chauvin, "A BackPropagation Algorithm With Optimal Use of Hidden Units," D.S. Touretzky, ed., Advances in Neural Information Processing 1.San Mateo, Calif.: Morgan Kaufmann, pp. 519526, 1989.
[9] D. Chester, "Why Two Hidden Layers Are Better Than One," Proc. IEEE Joint Int'l Conf. Neural Networks, vol. 1, pp. 265268,Washington, D.C., 1990.
[10] E.D. Dahl, "Accelerated Learning Using the Generalized Delta Rule," Proc. First IEEE Int'l Conf. Neural Networks, vol. 2, pp. 523530,San Diego, 1987.
[11] J. de Villiers and E. Barnard, "Backpropagation Neural Nets with One and Two Hidden Layers," IEEE Trans. Neural Networks, Vol. 4, No. 1, 1992, pp.136141.
[12] H.A.C. Eaton and T.L. Olivier, "Learning Coefficient Dependence on Training Set Size," Neural Networks, vol. 5, pp. 283288, 1992.
[13] S.E. Fahlman and C. Lebiere, "The CascadeCorrelation Learning Architecture," Tech. Report CMUCS90100, Carnegie Mellon Univ., 1990.
[14] S.E. Fahlman, "An Empirical Study of Learning Speed in Backpropagation," Tech. Report CMUCS88162, Carnegie Mellon Univ., 1988.
[15] L. Fausett, Fundamentals of Neural Networks.Englewood Cliffs, N.J.: Prentice Hall, 1994.
[16] R. Fletcher and C.M. Reeves, "Function Minimization by Conjugate Gradients," Computer J., vol. 7, pp. 149154, 1964.
[17] R. Fletcher, Practical Methods of Optimization. John Wiley and Sons, second ed., 1987.
[18] P.E. Gill, W. Murray, and M.H. Wright, Practical Optimization.New York: Academic Press, 1981.
[19] M. Gutierrez, J. Wang, and R. Grondin, "Estimating Hidden Unit Number for TwoLayer Perceptrons," Proc. IEEE/INNS Int'l Joint Conf. Neural Networks, vol. 1, pp. 677681,Washington, D.C., 1989.
[20] M. Hagiwara, "Removal of Hidden Units and Weights for Backpropagation Networks," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 351354,Nagoya, Japan, 1993.
[21] M. Hayashi, "A Fast Algorithm for the Hidden Units in a Multilayer Perceptron," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 339342,Nagoya, Japan, 1993.
[22] M.R. Hestenes and E.L. Stiefel, "Methods of Conjugate Gradients for Solving Linear Systems," J. Research Nat'l Bureau of Standards, vol. 49, no. 6, pp. 409436, 1952.
[23] G.E. Hinton, “Connectionist Learning Procedures,” Artificial Intelligence, vol. 40, pp. 185–234, 1989.
[24] M. Hoehfeld and S.E. Fahlman, "Learning With Limited Numerical Precision Using Cascade Correlation Algorithm," IEEE Trans. Neural Networks, vol. 3, no. 4, pp. 602611, 1992.
[25] K. Hornik, M. Stinchcombe, and H. White, “Multilayer Feedforward Networks are Universal Approximations,” Neural Networks, vol. 2, pp. 359366, 1989.
[26] S.C. Huang and Y.F. Huang, "Bounds on the Number of Hidden Neurons in Multilayer Perceptrons," IEEE Trans. Neural Networks, vol. 2, no. 1, pp. 4755, 1991.
[27] F.K. Hwang, “Comments on Reliable Loop Topologies for Large Local Computer Networks,” IEEE Trans. Computers, vol. 36, no. 3, pp. 383384, Mar. 1987.
[28] R.A. Jacobs, "Increased Rates of Convergence Through Learning Rate Adaptation," Neural Networks, vol. 1, no. 4, pp. 295307, 1988.
[29] E.M. Johansson, F.U. Dowla, and D.M. Goodman, "Backpropagation Learning for Multilayer Feedforward Neural Networks Using the Conjugate Gradient Method," Int'l J. Neural Systems, vol. 2, no. 4, pp. 291301, 1992.
[30] S. Judd, "Learning in Networks Is Hard," Proc. First IEEE Int'l Conf. Neural Networks, vol. 2, pp. 685692,San Diego, 1987.
[31] E.D. Karnin,“A simple procedure for pruning backpropagation trained neural networks,” IEEE Trans. Neural Networks, vol. 1, no. 2, pp. 239242, June 1990.
[32] J.K. Kruschke, "Improving Generalization in Backpropagation Networks With Distributed Bottlenecks," Proc. IEEE/INNS Int'l Joint Conf. Neural Networks, vol 1, pp. 443447,Washington, D.C., 1989.
[33] S.Y. Kung, K. Diamantaras, W.D. Mao, and J.S. Taur, "Generalized Perceptron Networks With Nonlinear Discriminant Functions," R.J. Mammone and Y.Y. Zeevi, eds., Neural Networks Theory and Applications.New York: Academic Press, pp. 245279, 1991.
[34] K. D. Wagner, C. K. Chin, and E. J. McCluskey,“Pseudorandom testing,”IEEE Trans. Comput., vol. C36, pp. 332–343, Mar. 1987.
[35] A. Lapedes, "How Neural Networks Work," Neural Info. Proc. Sys., pp. 442456, 1988.
[36] B. Gudmundsson and M. Randen, “Incremental Generation of Projections of CTVolumes,” Proc. First Conf. Visualization and Biomedical Computing, IEEE Press, Piscataway, N.J., 1990, pp. 2734.
[37] M. Levene and G. Loizou,“Semantics for null extended nested relations,” ACM Trans. Database Systems, vol. 18, no. 3, pp. 414459, 1993.
[38] G. Li, H. Alnuweiri, and W. Wu, "Acceleration of Backpropagation Through Initial Weight PreTraining With Delta Rule," Proc. IEEE Int'l Conf. Neural Networks, vol. 1, pp. 580585,San Francisco, 1993.
[39] C.G. Looney, "Stabilization and Speedup of Convergence in Training Feedforward Neural Networks," Neurocomputing 10, pp. 731, 1996.
[40] C.G. Looney, "Neural Networks As Expert Systems," J. Expert Systems With Applications, vol. 6, no. 2, pp. 129136, 1993.
[41] J. Makhoul, A. ElJaroudi, and R. Schwartz, "Formation of Disconnected Decision Regions With a Single Hidden Layer," Proc. IEEE/INNS Int'l Joint Conf. Neural Networks, vol. 1, pp. 455460,Washington, D.C., 1989.
[42] A. Masahiko, "Mapping Abilities of ThreeLayer Neural Networks," Proc. IEEE/INNS Int'l Joint Conf. Neural Networks, vol. 1, pp. 419423,Washington, D.C., 1989.
[43] C. McCormack and J. Doherty, "Neural Network Super Architectures," Proc. Int'l Joint Conf. Networks, vol. 1, pp. 301304,Nagoya, Japan, 1993.
[44] K.G. Mehrotra, C.K. Mohan, and S. Ranka, "Bounds on the Number of Samples Needed for Neural Learning," IEEE Trans. Neural Networks, vol. 2, no. 6, pp. 548558, 1991.
[45] M.L. Minsky and S.A. Papert, Perceptrons.Cambridge, Mass.: MIT Press, 1988.
[46] K.V. Mital, Optimization Methods.New York: Halstead Press, 1976.
[47] M.C. Mozer and P. Smolensky, "Skeletonization: A Technique for Trimming the Fat From a Network Via Relevance Assessment," D.S. Touretzky, ed., Advances in Neural Information Processing 1.San Mateo, Calif.: Morgan Kaufmann, pp. 107115, 1989.
[48] D. Nguyen and B. Widrow, "Improving the Learning Speed of TwoLayer Neural Networks by Choosing Initial Values of the Adaptive Weights," Proc. IEEE Int'l Joint Conf. Neural Networks, vol. 3, pp. 2126,San Diego, 1990.
[49] D.B. Parker, "Learning Logic," Technical Report TR47, MIT Center for Research in Computational Economics and Management Science, Cambridge, Mass., 1985.
[50] A.G. Parlos, B. Fernandez, A.F. Atiya, J. Muthusami, and W.K. Tsai, "An Accelerated Learning Algorithm for Multilayer Perceptron Networks," IEEE Trans. Neural Networks, vol. 5, no. 3, pp. 493497, 1994.
[51] M. Pelillo and A.M. Fanelli, "A Method of Pruning Layered FeedForward Neural Networks," Proc. IWANN (Sitges, Barcelona). Berlin: SpringerVerlag, 1993.
[52] D.C. Plaut, S.J. Nowlan, and G.E. Hinton, "Experiments on Learning by Backpropagation," Technical Report CMUCS86126, Carnegie Mellon Univ., Pittsburgh, Pa., 1986.
[53] H.L. Poh, "A Neural Network Approach for Marketing Strategies Research and Decision Support," PhD thesis, Stanford Univ., 1991.
[54] E. Polak and G. Ribière, "Note sur la convergence de methods de directions conjures," Revue Francais Information Recherche Operationnelle, vol. 16, pp. 3543, 1969.
[55] R. Reed, "Pruning algorithms—A survey," IEEE Trans. Neural Networks, vol. 4, no. 5, pp. 740747, Sept. 1993.
[56] H. Braun and M. Riedmiller, "Direct Adaptive Method for Faster Backpropagation Learning: The RPropAlgorithm," Proc. IEEE Int'l Conf. Neural Networks (ICNN '93), IEEE, Piscataway, N.J., 1993, pp. 586591
[57] H. Robbins and S. Monro, "A Stochastic Approximation Method," Annals Math. Statistics, vol. 22, pp. 400407, 1951.
[58] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, "Learning Internal Representations by Error Propagation," Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations, D.E. Rumelhart and J.L. McClelland et al., eds., chapter 8, pp. 318362.Cambridge, Mass.: MIT Press, 1986.
[59] W. Schmidt, S. Raudys, M. Kraaijveld, M. Skurikhina, and R. Duin, "Initializations, Backpropagation and Generalization of FeedForward Classifiers," Proc. IEEE Int'l Conf. Neural Networks, vol. 1, pp. 598604, 1993.
[60] R. Setioni and L.C.K. Hui, "Some nBit Parity Problems Are Solvable by Feedforward Networks With Less Than n Hidden Units," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 305308,Nagoya, Japan, 1993.
[61] D.F. Shanno, "Recent Advances in Numerical Techniques for LargeScale Optimization," Neural Networks for Control.Cambridge, Mass.: MIT Press, 1990.
[62] Y.L. Shea and C.G. Looney, "TwoStage Random Optimization of Neural Networks With Sensitivity Based Pruning of Weights," Proc. Golden West Int'l Conf. Intelligent Systems,Reno, Nev., pp. 1823, 1992.
[63] D.H. Bailey,“Vector computer memory bank contention,” IEEE Trans. Computers, vol. 36, pp. 293298, 1987.
[64] W.S. Stornetta and B.A. Huberman, "An Improved ThreeLayer Backpropagation Algorithm," Proc. First IEEE Int'l Conf. Neural Networks, vol. 2, pp. 645651,San Diego, 1987.
[65] P.D. Wasserman, Advanced Methods in Neural Computing, Van Nostrand Reinhold, New York, 1993.
[66] R.L. Watrous, "Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization," Proc. First IEEE Int'l Conf. Neural Networks, vol. 2, pp. 619627,San Diego, 1987.
[67] A.S. Weigend, D.E. Rumelhart, and B.A. Huberman, "Generalization by WeightElimination Applied to Currency Exchange Rate Prediction," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 837841,Seattle, 1991.
[68] A.S. Weigend, B.A. Huberman, and D.E. Rumelhart, "Predicting the Future: A Connectionist Approach," Stanford PDP Research Group Report 9001, 1990.
[69] A.S. Weigend, D.R. Rumelhart, and B.A. Huberman, "Backpropagation, Weight Elimination, and Time Series Prediction," Proc. Connectionist Models Summer School, pp. 6580, 1990.
[70] A. Wieland and R. Leighton, "Geometric Analysis of Neural Network Capabilities," Proc. First IEEE Int'l Conf. Neural Networks, vol. 3, pp. 385392,San Diego, 1987.
[71] P. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, doctoral dissertation, Harvard Univ., Cambridge, Mass., 1974. Reprinted as P. Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting, John Wiley&Sons, New York, 1994.
[72] D. Whitley and C. Bogart, "The Evolution of Connectivity: Pruning Neural Networks Using Genetic Algorithms," Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 134138,Washington, D.C., 1990.
[73] K. Yamada, H. Kami, and J. Tsukomo, "Handwritten Numeral Recognition by Multilayered Neural Network With Improved Learning Algorithm," IEEE Int'l Joint Conf. Neural Networks, vol. 2, pp. 259266,Washington, D.C., 1989.
[74] X. Yu, N. Loh, and W. Miller, "A New Acceleration Technique forthe Backpropagation Algorithm," Proc. IEEE Int'l Conf. Neural Networks, vol. 3, pp. 1,1571,161,San Francisco, 1993.
[75] J.M. Zurada, Introduction to Artificial Neural Systems. West Publishing Company, 1992.
[76] M. Zurada, "Lambda Learning Rule for Feedforward Neural Networks," Proc. IEEE Int'l Conf. Neural Networks, vol. 3, pp. 1,8081,811, 1993.