This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Approximation, Dimension Reduction, and Nonconvex Optimization Using Linear Superpositions of Gaussians
October 1993 (vol. 42 no. 10)
pp. 1222-1233

This paper concerns neural network approaches to function approximation and optimization using linear superposition of Gaussians (or what are popularly known as radial basis function (RBF) networks). The problem of function approximation is one of estimating an underlying function f, given samples of the form ((y/sub i/, x/sub i/); i=1,2,...,n; with y/sub i/=f(x/sub i/)). When the dimension of the input is high and the number of samples small, estimation of the function becomes difficult due to the sparsity of samples in local regions. The authors find that this problem of high dimensionality can be overcome to some extent by using linear transformations of the input in the Gaussian kernels. Such transformations induce intrinsic dimension reduction, and can be exploited for identifying key factors of the input and for the phase space reconstruction of dynamical systems, without explicitly computing the dimension and delay. They present a generalization that uses multiple linear projections onto scalars and successive RBF networks (MLPRBF) that estimate the function based on these scaler values. They derive some key properties of RBF networks that provide suitable grounds for implementing efficient search strategies for nonconvex optimization within the same framework.

[1] G. Cybenko, "Approximation by superposition of a sigmoidal function,"Math. Contr., Signals, Syst., vol. 2, 1989.
[2] K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward networks are universal approximators,"Neural Networks, vol. 2, pp. 359-366, 1989.
[3] K.-I. Funahashi, "On the approximate realization of continuous mappings by neural networks,"Neural Networks, vol. 2, pp. 183-192, 1989.
[4] M. J. D. Powell, "An efficient method for finding the minimum of a function of several variables without calculating derivatives,"Comput. J., 1977.
[5] B. Widrow and M. A. Lehr, "30 years of adaptive neural networks: Perceptron, madaline, and backpropagation,"Proc. IEEE, vol. 78, no. 9, pp. 1415-1442, 1990.
[6] N. Baba, "A new approach for finding the global minimum of error function of neural networks,"Neural Networks, vol. 2, pp. 367-373, 1989.
[7] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representation by error propagation,"Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vols. 1 and 2. Cambridge, MA: MIT Press, 1986.
[8] H. Bourlard and K. Hornik, "Neural networks and principal component analysis: Learning from examples without local minima,"Neural Networks, vol. 2, pp. 53-58, 1988.
[9] R. A. Jacobs, "Increased rates of convergence through learning rate adaptation,"Neural Networks, vol. 1, pp. 295-307, 1988.
[10] T. P. Vogl, J. K. Mangis, A. K. Rigler, W. T. Zink, and D. L. Alkon, "Accelerating the convergence of the back-propagation method,"Biol. Cybern., vol. 59, pp. 257-263, 1988.
[11] M. Hestenes,Conjugate Direction Methods in Optimization. New York: Springer-Verlag, 1980.
[12] C. J. Stone, "Optimal global rates of convergence for nonparametric regression,"Ann. Stat., vol. 10, no. 4, pp. 1040-1053, 1982.
[13] F. Mosteller and J. W. Tukey, "Data analysis, including statistics," inHandbook of Social Psychology, Vol. 2, G. Lindzey and E. Aronson, Eds. New York: Addison-Wesley, 1968.
[14] G. Whaba, "Spline models for observational data," inReg. Conf. Series Appl. Math., vol. 59. Philadelphia, PA: SIAM, 1990.
[15] H. Akaike, "Statistical predictor identification,"Ann. Inst. Stat. Math., vol. 22, pp. 203-217, 1970.
[16] S. J. Farlow,Self-Organizing Methods in Modeling: GMDH Type Algorithms. New York: Marcel Dekker, 1984.
[17] J. Rissanen, "A universal prior for integers and estimation by minimum description length,"Ann. Stat., vol. 11, pp. 416-431, June 1983.
[18] A. R. Barron, "Statistical properties of artificial neural networks," presented at 28th IEEE Conf. Dec. Contr., Tampa, FL, Dec. 1989.
[19] N. Dyn, "Interpolation of scattered data by radial functions," inTopics in Multivariate Approximation, C. K. Chui, L. L. Schumaker, and F. I. Utreras, Eds. New York: Academic, 1987.
[20] I. R. H. Jackson, "Convergence properties of radial basis functions," inConstructive Approximation. New York: Springer-Verlag, 1988.
[21] M. J. D. Powell,Radial Basis Functions for Multivariate Interpolation: A Review in algorithms for approximations, J. C. Mason and M. G. Cox, Eds. Oxford: Clarendon, 1987.
[22] P. Medgassy,Decomposition of Superposition of Distribution Functions. Budapest: Hungarian Academy of Sciences, 1961.
[23] M. Casdagli, "Nonlinear prediction of chaotic time series," Tech. Rep., Queen Mary College, London, 1988.
[24] D. S. Broomhead and D. Lowe, "Multivariable functional interpolation and adaptive networks,"Complex Syst., vol. 2, pp. 321-355, 1988.
[25] J. Moody and C. Darken, "Learning with localized receptive fields," inProc. 1988 Connectionist Models Summer School, CMU.
[26] T. Poggio and F. Girosi, "Networks for approximation and learning,"Proc. IEEE, vol. 78, pp. 1481-1497, Sept. 1990.
[27] C. A. Micchelli, "Interpolation of scattered data: Distance matrices and conditionally positive definite functions,"Construct. Approx., vol. 2, pp. 11-22, 1986.
[28] A. Saha and J. D. Keeler, "Algorithms for better representation and faster learning in radial basis function networks," inAdvances in Neural Information Processing Systems--2, D. Touretzky, Ed. San Mateo, CA: Morgan Kaufmann, 1990.
[29] N. H. Packard, J. P. Crutchfield, J. D. Farmer, and R. S. Shaw, "Geometry from a time series,"Phys. Rev. Lett., vol. 45, pp. 712-716, 1980.
[30] F. Takens, "Detecting strange attractors in turbulence," inDynamical Systems and Turbulence. Warwick, 1980;Lecture Notes Math., vol. 898. Berlin: Springer, 1981, pp. 366-381.
[31] F. Takens, "On the numerical determination of the dimension of an attractor," inDynamical Systems and Bifurcations. The Netherlands: Groningen, 1984.
[32] P. Grassberger and I. Procaccia, "Measuring the strangeness of attractors,"Physica, vol. 9D, pp. 189-208, 1983.
[33] A. Wolf, J. Swift, H. Swinney, and J. Vastano, "Determining Lyapunov exponents from a time series,"Physica, vol. 16D, pp. 285-317, 1985.
[34] A. M. Fraser, "Information and entropy in strange attractors,"IEEE Trans. Inform. Theory, vol. 35, Mar. 1989. Also, Ph.D. dissertation F826, Univ. Texas, Austin, 1988.
[35] M. C. Mackey and L. Glass,Sci., vol. 197, p. 287, 1977.
[36] H. G. Schuster,Deterministic Chaos: An Introduction, VCH, 1988.
[37] D. J. Farmer and J. J. Sidorowich, "Exploiting chaos to predict the future and reduce noise," Rep. LA-UR-88-901, Los Alamos Nat. Lab., Mar. 1988.
[38] A. Saha, J. Christian, D.-S. Tang, and C.-L. Wu, "Oriented nonradial basis function networks for image coding and analysis," inAdvances in Neural Information Processing Systems-3, R. P. Lippmann, J. Moody, and D. Touretzky, Eds. San-Mateo, CA: Morgan Kaufmann, 1991.
[39] L. C. Yann, J. S. Denker, and S. A. Solla, "Ontimal brain damage," inAdvances in Neural Information Processing Systems--2, D. Touretzky, Ed. New York: Morgan Kaufmann.
[40] J. H. Friedman and W. Stuetzle, "Projection pursuit regression,"J. Amer. Statist. Assoc., vol. 76, pp. 817-823, 1981.
[41] P. J. Huber, "Projection pursuit,"Ann. Stat., vol. 13, pp. 435-525, June 1985.
[42] K. Hornik, M. Stinchcombe, and H. White, "Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks,"Neural Networks, vol. 3, pp. 551-560, 1990.
[43] P. Hall, "On projection pursuit regression,"Ann. Stat., vol. 17, no. 2, pp. 573-588, 1989.
[44] J. Park and I. W. Sandberg, "Universal approximation using radial-basis function networks,"Neural Computat., vol. 3, pp. 246-257, 1991.
[45] B. Widrow and F. W. Smith, "Pattern-recognizing control systems," presented at the Comput. Inform. Sci. Symp., Washington, DC, 1963.
[46] A. G. Barto, "Connectionist learning for control: An overview," COINS Tech. Rep. 89-89, Sept. 1989.
[47] M. Kwato, K. Kurukawa, and R. Suzuki, "A hierarchical neural network model for control and learning of voluntary movement,"Bio. Cybern., pp. 169-185, 1987.
[48] D. Psaltis, A. Sideris, and A. Yamamura, "Neural network controllers," presented at the IEEE Int. Conf. Neural Networks, 1987.
[49] K. S. Narendra and K. Parthasarathy, "Identification and control of dynamical systems using neural networks,"IEEE Trans. Neural Networks, vol. 1, pp. 4-27, Mar. 1990.
[50] E. Sontag, "Feedback stabilization using two-hidden-layer nets," Rep. SYCON-90-03, Rutgers Center Syst. Contr., Rutgers Univ., New Brunswick, NJ.
[51] I. N. Bronshtein, and K. A. Semendyayev,Handbook of Mathematics. New York: Van Nostrand Reinhold, 1985.
[52] H. S. Seung, H. Sompolinsky, and N. Tishby, "Statistical mechanics of learning from examples. I. General formulation and annealed approximation" (preprint).
[53] H. S. Seung, H. Sompolinsky, and N. Tishby, "Statistical mechanics of learning from examples. II. Quenched theory and unrealizable rules" (preprint).
[54] P. Baldi and Y. Chauvin, "Temporal evolution of generalization during learning in linear networks,"Neural Computat., vol. 3, pp. 589-603, 1991.
[55] S. Karlin,Total Positivity. Stanford, CA: Stanford University Press, 1969.

Index Terms:
linear superpositions of Gaussians; dimension reduction; nonconvex optimization; neural network approaches; function approximation; radial basis function; function approximation; neural nets; optimisation; polynomials.
Citation:
A. Saha, Chuan-Lin Wu, Dun-Sung Tang, "Approximation, Dimension Reduction, and Nonconvex Optimization Using Linear Superpositions of Gaussians," IEEE Transactions on Computers, vol. 42, no. 10, pp. 1222-1233, Oct. 1993, doi:10.1109/12.257708
Usage of this product signifies your acceptance of the Terms of Use.