This Article 
 Bibliographic References 
 Add to: 
Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks
April 1991 (vol. 13 no. 4)
pp. 355-364

The problem of multiclass pattern classification using adaptive layered networks is addressed. A special class of networks, i.e., feed-forward networks with a linear final layer, that perform generalized linear discriminant analysis is discussed, This class is sufficiently generic to encompass the behavior of arbitrary feed-forward nonlinear networks. Training the network consists of a least-square approach which combines a generalized inverse computation to solve for the final layer weights, together with a nonlinear optimization scheme to solve for parameters of the nonlinearities. A general analytic form for the feature extraction criterion is derived, and it is interpreted for specific forms of target coding and error weighting. An important aspect of the approach is to exhibit how a priori information regarding nonuniform class membership, uneven distribution between train and test sets, and misclassification costs may be exploited in a regularized manner in the training phase of networks.

[1] H. Asoh and N. Otsu, "An approximation of nonlinear discriminant analysis by multilayer neural networks, inProc. Int. Joint Conf. Neural Networks, San Diego, CA, 1990, pp. III-211-III-216.
[2] H. Bourlard and K. Hornik, "Neural networks and principal component analysis: Learning from examples without local minima,"Neural Networks, vol. 2, pp. 53-58, 1988.
[3] D. G. Bounds, B. Mathew, and G. Waddell, "A multi-layer perceptron network for the diagnosis of low back pain," inProc. IEEE Int. Conf. Neural Networks, vol. II, California, 1988, pp. II-481- II-489.
[4] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, "Classification and regression trees," Wadsworth International Group, 1984.
[5] D.S. Broomhead and D. Lowe, "Multi-variable functional interpolation and adaptive networks,"Complex Systems, vol. 2, no. 3, pp. 269-303, 1988.
[6] P.A. Devijver, "Relationships between statistical risks and the least-mean-square error design criterion in pattern recognition," inProc. First Int. Joint Conf. Pattern Recognition, Washington, Nov. 1973, pp. 139-148.
[7] P.A. Devijver and J. Kittler,Pattern Recognition: A Statistical Approach. London: Prentice-Hall International, 1982.
[8] R.A. Fisher, "The use of multiple measurements in taxonomic problems,"Ann. Eugenics, vol. 7, pp. 179-188, 1936.
[9] K. Fukunaga,Introduction to Statistical Pattern Recognition. New York: Academic, 1972.
[10] P. Gallinari, S. Thiria, and F. Fogelman Soulie, "Multilayer perceptrons and data analysis," inProc. 1988 ICNN Conf., San Diego, CA, 1988, pp. 391-398.
[11] G. Golub and W. Kahan, "Calculating the singular values and pseudo-inverse of a matrix,"SIAM. Numer. Anal., vol. 2, no. 2, pp. 205-224, 1965.
[12] G.H. Golub and C.F. Van Loan, "An analysis of the total least squares problem,"SIAM. Numer. Anal., vol. 17, no. 6, pp. 883-893, 1980.
[13] R. P. Gorman and T. J. Sejnowski, "Analysis of hidden units in a layered network trained to classify sonar targets,"Neural Networks, vol. 1, no. 1, pp. 75-90, 1989.
[14] D. J. Hand,Discrimination and Classification. New York: Wiley, 1981.
[15] A. K. Jain and B. Chandrasekharan, "Dimensionality and sample size considerations in pattern recognition practice," inHandbook of Statistics, vol. 2, P. R. Krishnaiah, and L. N. Kanal, Eds. Amsterdam, The Netherlands: North-Holland, 1982, pp. 835-855.
[16] D. Lowe and A. R. Webb, "Exploiting prior knowledge in network optimisation: an illustration from medical prognosis,"Network, vol. 1, no. 3, pp. 299-323, 1990.
[17] W.S. Meisel, "Least-square methods in abstract pattern recognition,"Inform. Sci., vol. 1, pp. 23-42, 1968.
[18] G. Mirchandani and W. Cao, "On hidden nodes for neural networks,"IEEE Trans. Circuits Syst., vol. 36, no. 5, pp. 661-664, 1989.
[19] N. J. Nilsson,Learning Machines: Foundations of Trainable Pattern-Classifying Systems. New York: McGraw-Hill, 1965.
[20] S. M. Peeling, R. K. Moore, and M. J. Tomlinson, "The multi-layer perceptron as a tool for speech pattern processing research," inProc. IoA Autumn Conf. Speech and Hearing, vol. 8, pp. 307-314, 1986.
[21] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representation by error propagation,"Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vols. 1 and 2. Cambridge, MA: MIT Press, 1986.
[22] D. F. Specht, "Generation of polynomial discriminant functions for pattern recognition,"IEEE Trans. Electron. Comput., vol. EC-16, no. 3, pp. 308-319, 1967.
[23] J. R. Quinlan, "Induction of decision trees,"Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[24] A. R. Webb and D. Lowe, "A comparison of nonlinear optimisation strategies for feed-forward adaptive layered networks," RSRE Memo. 4157, 1988.
[25] A. R. Webb and D. Lowe, "A hybrid optimization strategy for adaptive feed-forward layered networks," RSRE Memo. 4193, 1988.
[26] A. R. Webb and D. Lowe, "The optimized internal representation of multilayer classifier networks performs nonlinear discriminant analysis,"Neural Networks, vol. 3, no. 4, pp. 367-375, 1990.
[27] W. G. Wee, "Generalized inverse approach to adaptive multiclass pattern classification,"IEEE Trans. Comput., vol. C-17, no. 12, pp. 1157-1164, 1968.
[28] S. S. Yau and J. M. Garnett, "Least-mean-square approach to pattern classification," inFrontiers of Pattern Recognition, M. S. Wantanabe, Ed. New York: Academic, 1972.
[29] T.Y. Young and T. W. Calvert,Classification, Estimation and Pattern Recognition. New York: American Elsevier, 1974.

Index Terms:
pattern recognition; feature extraction; Bayes decision; feed-forward classifier networks; multiclass pattern classification; adaptive layered networks; least-square approach; nonlinear optimization; target coding; error weighting; adaptive systems; Bayes methods; decision theory; encoding; optimisation; pattern recognition
D. Lowe, A.R. Webb, "Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp. 355-364, April 1991, doi:10.1109/34.88570
Usage of this product signifies your acceptance of the Terms of Use.