This Article 
 Bibliographic References 
 Add to: 
On the Generalization Ability of Neural Network Classifiers
June 1994 (vol. 16 no. 6)
pp. 659-663

This correspondence presents a method for evaluation of artificial neural network (ANN) classifiers. In order to find the performance of the network over all possible input ranges, a probabilistic input model is defined. The expected error of the output over this input range is taken as a measure of generalization ability. Two essential elements for carrying out the proposed evaluation technique are estimation of the input probability density and numerical integration. A nonparametric method, which depends on the nearest M neighbors, is used to locally estimate the distribution around each training pattern. An orthogonalization procedure is utilized to determine the covariance matrices of local densities. A Monte Carlo method is used to perform the numerical integration. The proposed evaluation technique has been used to investigate the generalization ability of back propagation (BP), radial basis function (RBF) and probabilistic neural network (PNN) classifiers for three test problems.

[1] R. Lippmann, "Pattern classification using neural networks,"IEEE Commun. Mag., vol. 27, no. 11, 1989.
[2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representation by error propagation,"Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vols. 1 and 2. Cambridge, MA: MIT Press, 1986.
[3] M. J. D. Powell,Radial Basis Functions for Multivariate Interpolation: A Review in algorithms for approximations, J. C. Mason and M. G. Cox, Eds. Oxford: Clarendon, 1987.
[4] C. A. Micchelli, "Interpolation of scattered data: Distance matrices and conditionally positive definite functions,"Constructive Approximation, vol. 2, pp. 11-22, 1986.
[5] D. S. Broomhead and D. Lowe, "Multivariable functional interpolation and adaptive networks,"Complex Syst., vol. 2, pp. 321-355, 1988.
[6] J. Moody and C. J. Darken, "Fast learning in networks of locally tuned processing units,"Neural Computat., vol. 1, pp. 281-294, 1989.
[7] D. F. Specht, "Probabilistic neural networks and the polynomial adaline as complementary techniques for classification,"IEEE Trans. Neural Networks, vol. 1, no. 1, pp. 111-121, 1990.
[8] D. F. Specht, "Generation of polynomial discriminant functions for pattern recognition,"IEEE Trans. Electron. Comput., vol. EC-16, pp. 308-319, 1967.
[9] G. Vrckovnik, C. R. Carter, and S. Haykin, "Radial basis function classification of impulse radar waveforms," inProc. Int. Joint Conf. Neural Netw., vol. I, June 1990, pp. 45-50.
[10] T. Leen, M. Rudnick, and D. Hammerstrom, "Hebbian feature discovery improves classifiers efficiency," inProc. Proc. Int. Joint Conf. Neural Netw., vol. I, June 1990, pp. 51-56.
[11] N. Hataoka and A. H. Waibel, "Speaker independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs)," inProc. Int. Joint Conf. Neural Netw., vol. I, June 1990, pp. 57-62.
[12] H. J. Shyu, J. M. Libert, and S. D. Mann, "Classifying seismic signals via RCE neural network," inProc. Int. Joint Conf. Neural Netw., vol. I, June 1990, pp. 101-106.
[13] C. Deng and S. Haykin, "A multi-layer neural network classifier for radar clutter," inProc. Int. Joint Conf. Neural Netw., vol. I, June 1990, pp. 241-246.
[14] K. Fukunaga and R. R. Hayes, "Estimation of classifier performance,"IEEE Trans. Pattern Anal. Machine Intell., vol. 11, no. 10, pp. 1087-1101, Oct. 1989.
[15] P. A. Lachenbruch and R. M. Mickey, "Estimation of error rates in discriminant analysis,"Technometrics, no. 10, pp. 1-11, 1968.
[16] B. Efron, "Bootstrap methods: Another look at the Jackknife,"Ann. Statist., vol. 7, no. 1, pp. 1-26, 1979.
[17] W. S. Meisel,Computer Oriented Approaches to Pattern Recognition. New York; Academic Press, 1972, vol. 83.
[18] M. D. Richard and R. P. Lippmann, "Neural network classifiers estimate bayesiana posterioriprobabilities,"Neural Computat., vol. 3, 461-483, 1991.
[19] S. Yoshimoto, "A study on artificial neural network generalization capability," inProc. Int. Joint Conf. Neural Netw., vol. III, June 1990, pp. 689-694.
[20] V. A. Epanechnikov, "Nonparametric estimation of a multidimensional probability density,"Theory of Probab. Applicat., vol. 14, pp. 153-158, 1969.
[21] G. Strang,Linear Algebra and its Applications. New York: Academic Press, 1976.
[22] G. H. Golub and C. F. Van Loan,Matrix Computations. Baltimore, MD: Johns Hopkins Univ. Press, 1989.
[23] M. T. Musavi, W. Ahmed, K. H. Chan, K. B. Faris, and D. M. Hummels, "On the training of radial basis function classifiers,"Neural Network, vol. 5, pp. 595-603, 1992.
[24] K. H. Chan, "A probabilistic model for evaluation of neural network classifiers," M.S. Thesis, Elect. Eng. Dept., Univ. of Maine, 1990.
[25] R. Billinton and R. N. Allan,Reliability Evaluation of Engineering Systems. New York: Plenum Press, 1992.
[26] Y. H. Yu and R. F. Simmons, "Descending epsilon in back propagation: A technique for better generalization,"Proc. Int. Joint Conf. Neural Netw., June 1990, pp. 167-172.
[27] M. T. Musavi, K. Faris, K. H. Chan, and W. Ahmed, "On the implementation of RBF technique in neural networks," inProceedings of the Analysis of Neural Network Applications (ANNA)(George Mason Univ., Fairfax, VA). New York: ACM Press, May 1991, pp. 110-115.

Index Terms:
pattern recognition; neural nets; generalisation (artificial intelligence); probability; integration; Monte Carlo methods; generalization ability; neural network classifiers; expected error; input probability density; numerical integration; covariance matrices; Monte Carlo method; back propagation; radial basis function; probabilistic neural network
M.T. Musavi, K.H. Chan, D.M. Hummels, K. Kalantri, "On the Generalization Ability of Neural Network Classifiers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 6, pp. 659-663, June 1994, doi:10.1109/34.295911
Usage of this product signifies your acceptance of the Terms of Use.