This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Optimal Linear Combination of Neural Networks for Improving Classification Performance
February 2000 (vol. 22 no. 2)
pp. 207-215

Abstract—With a focus on classification problems, this paper presents a new method for linearly combining multiple neural network classifiers based on statistical pattern recognition theory. In our approach, several neural networks are first selected based on which works best for each class in terms of minimizing classification errors. Then, they are linearly combined to form an ideal classifier that exploits the strengths of the individual classifiers. In this approach, the minimum classification error (MCE) criterion is utilized to estimate the optimal linear weights. In this formulation, because the classification decision rule is incorporated into the cost function, a more suitable better combination of weights for the classification objective could be obtained. Experimental results using artificial and real data sets show that the proposed method can construct a better combined classifier that outperforms the best single classifier in terms of overall classification errors for test data.

[1] J.A. Benediktsson, J.R. Sveinsson, O.K. Ersoy, and P.H. Swain, “Parallel Consensual Neural Networks,” IEEE Trans. Neural Networks, vol. 8, no. 1, pp. 54-64, 1997.
[2] C.M. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[3] J.R. Blum, “Approximation Methods which Converge with Probability One,” Annals Math. and Statistics, vol. 25, pp. 382-386, 1957.
[4] L. Breiman, “Bagging Predictors,” Technical Report TR 421, Dept. of Statistitics, Univ. of California, 1994.
[5] L. Breiman, “Stacked Regressions,” Machine Learning, vol. 24, pp. 49-64, 1996.
[6] L. Breiman, “Bias, Variance, and Arcing Classifiers,” Technical Report TR 460, Dept. of Statistics, Univ. of California, 1996.
[7] W.G. Chaplin and V.S. Levadi, “A Generalization of the Threshold Decision Algorithm to Multiple Classes,” Computer and Information Sciences-II, J.T. Tou, ed., pp. 337-355, Academic Press, 1967.
[8] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. John Wiley&Sons, 1973.
[9] L.K. Hansen and P. Salamon, “Neural Network Ensembles,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993-1001, Oct. 1990.
[10] S. Hashem, “Optimal Linear Combination of Neural Networks,” Neural Networks, vol. 10, pp. 599-614, 1997.
[11] B. H. Juang,S. Katagiri,“Discriminative Learning for Minimum Error Classification,” IEEE Trans. on Signal Processing, vol. 40, pp. 3043-3054, Dec. 1992.
[12] T. Komori and S. Katagiri, “GPD Training of Dynamic Programming-Based Speech Recognizers,” J. Acoustics Soc. Japan, vol. 13, no. 6, pp. 341-349, 1992.
[13] A. Krogh and J. Vedelsby, “Neural Network Ensembles, Cross Validation, and Active Learning,” Advances in Neural Information Processing Systems 7, G. Tesauro, et al., eds., pp. 231-238, Cambridge, Mass.: MIT Press, 1995.
[14] The MathWorks, Inc., “Optimization Toolbox User's Guide” 1996.
[15] E. McDermott and S. Katagiri, “Prototype Based Minimum Error Training for Speech Recognizers,” L. Appl. Int., vol. 4, pp. 245-256, 1994.
[16] M.P. Perrone, “Improving Regression Estimates: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization,” PhD thesis, Brown Univ., 1993.
[17] L. Prechelt, “Proben1—A Set of Neural Network Benchmark Problems and Benchmarking Rules,” Technical Report TR 21, Univ. of Karlsruhe, 1994.
[18] K. Saito and R. Nakano, “Partial BFGS Update and Efficient Step-Length Calculation for Three-Layer Neural Networks,” Neural Computation, vol. 9, pp. 123-141, 1997.
[19] M. Stone, “Cross-Validation: A Review,” Math. Operations for. Statistical Series Statistics, vol. 9, no. 1, pp. 127-139, 1978.
[20] V. Tresp and M. Taniguchi, “Combining Estimators Using Non-Constant Weighting Functions,” Advances in Neural Information Processing Systems 7, G. Tesauro et al., eds., pp. 419-426, Cambridge, Mass.: MIT Press, 1995.
[21] K. Tumer and J. Ghosh, “Analysis of Decision Boundaries in Linearly Combined Neural Classifiers,” Pattern Recognition, vol. 29, no. 2, pp. 341-348, 1996.
[22] N. Ueda and R. Nakano, “Generalization Error of Ensemble Estimators,” Proc. IEEE Int'l Conf. Neural Networks, pp. 90-95, 1996.
[23] N. Ueda and R. Nakano, “Combining Discriminant-Based Classifers Using the Minimum Classification Error Discriminant,” Proc. IEEE Conf. Neural Networks for Signal Processing, pp. 365-374, 1997.
[24] H. Watanabe, T. Yamaguchi, and S. Katagiri, “Discriminative Metric Design for Robust Pattern Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 45, no. 11, pp. 2,655-2,662 Nov. 1997.
[25] D. Wolpert, "Stacked Generalization," Neural Networks, Vol. 5, 1992, pp. 241-259.

Index Terms:
Pattern classification, ensemble learning, linear combination, minimum classification error discriminant, neural network.
Citation:
Naonori Ueda, "Optimal Linear Combination of Neural Networks for Improving Classification Performance," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 2, pp. 207-215, Feb. 2000, doi:10.1109/34.825759
Usage of this product signifies your acceptance of the Terms of Use.