This Article 
 Bibliographic References 
 Add to: 
Generalization by Neural Networks
April 1992 (vol. 4 no. 2)
pp. 177-185

The authors discuss the requirements of learning for generalization, where the traditional methods based on gradient descent have limited success. A stochastic learning algorithm based on simulated annealing in weight space is presented. The authors verify the convergence properties and feasibility of the algorithm. An implementation of the algorithm and validation experiments are described.

[1] T. J. Sejnowski and P. D. Wasserman, "Neural networks, Part 2,"IEEE Expert Mag., vol. 3, no. 1, Spring 1988, Johns Hopkins Univ., Baltimore, MD.
[2] K. Yamada, H. Kami, J. Tsukomi, and T. Temma, "Handwritten numeral recognition by multi-layered neural network with improved learning algorithm," inProc. Int. Conf. Neural Networks, IEEE and ICNN, June 1989.
[3] M. A. Fanty, "Learning in neural networks," Ph.D. dissertation TR252, C.S. Dep., Univ. of Rochester, 1988.
[4] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by back propagating errors,"Nature, vol. 323, pp. 533-536, 1986.
[5] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, "A learning algorithm for Boltzmann machine,"Cognitive Sci., vol. 9, pp. 147-169, 1985.
[6] S. E. Hampsin and D. J. Volper, "Linear function neurons: Structure and training,"Biol. Cybern., vol. 53, pp. 203-217, 1986.
[7] G. Carpenter and S. Grossberg, "The ART of adaptive pattern recognition by a self-organizing neural network,"IEEE Comput. Mag., vol. 21, pp. 77-88, 1988.
[8] D. E. Rumelhart, "Brain style computation: Learning and generalization,"An Introduction to Neural and Electronic Networks. New York: Academic, 1990.
[9] G. E. Hinton, "Learning to recognize shapes in a parallel network," inProc. 1986 Fyssen Conf.Oxford, England: Oxford University Press, 1987.
[10] G. E. Hinton, "Learning invariant recognition in massively parallel network," inLecture Notes in Computer Science 258. Berlin, Germany: Springer Verlag, 1987, pp. 1-13.
[11] S. Shekhar and S. Dutta, "Bond rating: A non-conservative application of neural network," inProc. IEEE Int. Conf. Neural Networks, San Diego, CA, July 1988.
[12] R. Eckmiller, "The design of intelligent robot as a federation of geometric machines,"An Introduction to Neural and Electronic Networks. New York: Academic, 1990.
[13] H. Nielsen, "Neurocomputer applications," inNeural Computers, R. Eckmiller and Malsburg, Eds. Berlin, Germany: Springer-Verlag, 1988.
[14] A. Irani, J. P. Matts, J. M. Long, and J. R. Slagle, "Using artificial neural nets for statistical discovery: Observations after using back-propagation, expert systems, and multiple-linear regression on clinical trial data," Univ. of Minnesota Tech. Rep., 1989.
[15] P. Werbs, "Beyond regression: New tools for prediction and analysis in the behaviorial sciences," Ph.D. dissertation, Applied Math., Harvard Univ., Nov. 1974.
[16] S. I. Gallant, "Connectionist expert systems,"Commun. ACM, vol. 31, no. 2, pp. 152-169, Feb. 1988.
[17] H. V. Parunak, "Material handling: A conservative domain for neural connectivity and propagation," inProc. AAAI Conf., 1987, pp. 307-311.
[18] M. Minsky and S. Papert,Perceptrons. Cambridge, MA: MIT Press, 1968, reprinted in 1988.
[19] A. K. Kolmogorov, "On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition,"Doklady Akademii Nauk, vol. 114, pp. 369-373, SSSR, 1957.
[20] S. Patarnello and P. Carnevali, "Meaning of generalization," inLearning Capabilities of Boolean Networks, Neural Computing ArchitecturesI. Aleksander, Ed. Cambridge, MA: MIT Press, 1989, ch. 4.
[21] M. Arai, "Mapping abilities of three layered neural networks," inProc. Int. Conf. Neural Network, IEEE and INNS, June 1989.
[22] A. Blum and R. L. Rivest, "Training a 3-node neural net is NP-complete," inAdvances in Neural Information Processing Systems. Los Altos, CA: Morgan Kaufmann, 1989, pp. 494-501.
[23] S. Judd, "On complexity of loading shallow networks,"J. Complexity, vol. 4, pp. 177-182, Academic, 1988.
[24] J. S. Judd,Neural Network Design and the Complexity of Learning. Cambridge, MA: MIT Press, 1990.
[25] S. Ahmad and G. Tesauro, "Scaling and generalization in neural networks: A case study," inProc. Int. Conf. Neural Inform. Processing Syst., 1988.
[26] C. V. Ramamoorthy and S. Shekhar, "Stochastic backpropagation: A learning algorithm for generalization problems," inPROC. IEEE COMPSAC Conf., Orlando, FL, 1989.
[27] J. S. Judd, "Memorization and generalization," inNeural Network Design and the Complexity of Learning, MIT Press, 1990, ch. 7.
[28] J. Sietsma and R. J. F. Dow, "Creating artificial neural networks that generalize,"Neural Networks, vol. 4, pp. 67-69, Pergamon, 1991.
[29] N. A. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller, "Equation of state calculations by fast computing machines,"J. Chem. Phys., vol. 21, pp. 1087-1092, 1953.
[30] J. J. Hopfield, "Neural networks and physical systems with emergent collective computing abilities,"Proc. Nat. Acad. Sci. U.S.A., vol. 79, no. 8, pp. 2554-2558, 1982.
[31] J. J. Hopfield and D. W. Tank, "Neural computation of decisions in optimization problems,"Biol. Cybern., vol. 52, no. 3, pp. 141-153, 1985.
[32] D.E. Rumelhart and D. McClelland, eds.,Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vols. 1-2, MIT Press, Cambridge, Mass., 1986.
[33] G. E. Hinton, T. J. Sejnowski, and D. H. Ackley, "Boltzman machines: Constraint satisfaction machines that learn," Tech. Rep. CMU-CS- 84-119, Carnegie Mellon Univ., 1984.
[34] S. Judd, "Complexity of connectionist learning with various node functions," Tech. Rep. 87-60, Univ. of Massachusetts, Amherst, MA, 1987.
[35] M. R. Garey and D. S. Johnson,Computers and Intractability: A Guide to Theory of NP-Completeness. San Francisco, CA: Freeman, 1979.
[36] S. Kirkpatrick, "Optimization by simulated annealing,"Science, vol. 220, pp. 671-680, May 13, 1983.
[37] W. Feller,An Introduction to Probability Theory and Applications. New York: Wiley, 1950.
[38] F. Romeo and A. L. Sangiovanni-Vincentelli, "Probabilistic hill climbing algorithms: Properties and applications," inProc. 1985 Chapel Hill Conf. VLSI, May 1985, pp. 393-417.
[39] P. J. M. van Laarhoven,Simulated Annealing: Theory and Applications. Boston, MA: Reidel, 1987.
[40] B. Hajek, "Cooling schedules for optimal annealing,"Math. Oper. Res., vol. 13, no. 2, pp. 311-329, May 1988.
[41] L. McClelland and D. E. Rumelhart,Explorations in Parellel Distributed Processing--A Handbook of Models, Programs, and Exercises. Cambridge. MA: Bradford MIT Press, 1989.
[42] E. Aarts and J. Korst,Simulated Annealing and Boltzmann Machines, A Stochastic Approach to Combinational Optimization ond Neural Computing. New York: Wiley, 1989.

Index Terms:
generalization; neural networks; learning; gradient descent; stochastic learning algorithm; simulated annealing; weight space; convergence properties; learning systems; neural nets; simulated annealing
S. Shekhar, M.B. Amin, "Generalization by Neural Networks," IEEE Transactions on Knowledge and Data Engineering, vol. 4, no. 2, pp. 177-185, April 1992, doi:10.1109/69.134256
Usage of this product signifies your acceptance of the Terms of Use.