Subscribe

Issue No.12 - December (2010 vol.22)

pp: 1738-1751

Huanhuan Chen , University of Birmingham, Birmingham

Xin Yao , University of Birmingham, Birmingham

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.26

ABSTRACT

Negative Correlation Learning (NCL) [CHECK END OF SENTENCE], [CHECK END OF SENTENCE] is a neural network ensemble learning algorithm which introduces a correlation penalty term to the cost function of each individual network so that each neural network minimizes its mean-square-error (MSE) together with the correlation. This paper describes NCL in detail and observes that the NCL corresponds to training the entire ensemble as a single learning machine that only minimizes the MSE without regularization. This insight explains that NCL is prone to overfitting the noise in the training set. The paper analyzes this problem and proposes the multiobjective regularized negative correlation learning (MRNCL) algorithm which incorporates an additional regularization term for the ensemble and uses the evolutionary multiobjective algorithm to design ensembles. In MRNCL, we define the crossover and mutation operators and adopt nondominated sorting algorithm with fitness sharing and rank-based fitness assignment. The experiments on synthetic data as well as real-world data sets demonstrate that MRNCL achieves better performance than NCL, especially when the noise level is nontrivial in the data set. In the experimental discussion, we give three reasons why our algorithm outperforms others.

INDEX TERMS

Multiobjective algorithm, multiobjective learning, neural network ensembles, neural networks, negative correlation learning, regularization.

CITATION

Huanhuan Chen, Xin Yao, "Multiobjective Neural Network Ensembles Based on Regularized Negative Correlation Learning",

*IEEE Transactions on Knowledge & Data Engineering*, vol.22, no. 12, pp. 1738-1751, December 2010, doi:10.1109/TKDE.2010.26REFERENCES

- [1] Y. Liu and X. Yao, "Ensemble Learning via Negative Correlation,"
Neural Networks, vol. 12, no. 10, pp. 1399-1404, 1999.- [2] Y. Liu and X. Yao, "Simultaneous Training of Negatively Correlated Neural Networks in an Ensemble,"
IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 29, no. 6, pp. 716-725, Nov. 1999.- [3] L.K. Hansen and P. Salamon, "Neural Network Ensembles,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993-1001, Oct. 1990.- [4] X. Yao, M. Fischer, and G. Brown, "Neural Network Ensembles and Their Application to Traffic Flow Prediction in Telecommunications Networks,"
Proc. Int'l Joint Conf. Neural Networks, pp. 693-698, 2001.- [5] Y. Liu, X. Yao, and T. Higuchi, "Evolutionary Ensembles with Negative Correlation Learning,"
IEEE Trans. Evolutionary Computation, vol. 4, no. 4, pp. 380-387, Nov. 2000.- [6] L. Breiman, "Bagging Predictors,"
Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.- [7] R.E. Schapire, "A Brief Introduction to Boosting,"
Proc. 16th Int'l Joint Conf. Artificial Intelligence, pp. 1401-1406, 1999.- [8] T.K. Ho, "The Random Subspace Method for Constructing Decision Forests,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832-844, Aug. 1998.- [9] H.A. Abbass, "A Memetic Pareto Evolutionary Approach to Artificial Neural Networks,"
Proc. 14th Australian Joint Conf. Artificial Intelligence, pp. 1-12, 2000.- [10] R. McKay and H.A. Abbass, "Anti-Correlation: A Diversity Promoting Mechanism in Ensemble Learning,"
The Australian J. Intelligent Information Processing Systems, vol. 3, no. 4, pp. 139-149, 2001.- [11] H.A. Abbass, "Speeding Up Backpropagation Using Multiobjective Evolutionary Algorithms,"
Neural Computation, vol. 15, no. 11, pp. 2705-2726, 2003.- [12] Y. Jin, T. Okabe, and B. Sendhoff, "Neural Network Regularization and Ensembling Using Multi-Objective Evolutionary Algorithms,"
Proc. IEEE Congress on Evolutionary Computation (CEC '04), pp. 1-8, 2004.- [13] M.M. Islam, X. Yao, and K. Murase, "A Constructive Algorithm for Training Cooperative Neural Network Ensembles,"
IEEE Trans. Neural Networks, vol. 14, no. 4, pp. 820-834, July 2003.- [14] G. Brown, J. Wyatt, and P. Tino, "Managing Diversity in Regression Ensembles,"
J. Machine Learning Research, vol. 6, pp. 1621-1650, 2005.- [15] A. Chandra and X. Yao, "Evolving Hybrid Ensembles of Learning Machines for Better Generalisation,"
Neurocomputing, vol. 69, nos. 7-9, pp. 686-700, 2006.- [16] A. Chandra and X. Yao, "Ensemble Learning Using Multi-Objective Evolutionary Algorithms,"
J. Math. Modelling and Algorithms, vol. 5, no. 4, pp. 417-445, 2006.- [17] L.S. Oliveira, M. Morita, R. Sabourin, and F. Bortolozzi, "Multi-Objective Genetic Algorithms to Create Ensemble of Classifiers,"
Proc. Third Int'l Conf. Evolutionary Multi-Criterion Optimization, vol. 87, pp. 592-606, 2005.- [18] N. García, C. Hervás, and D. Ortiz, "Cooperative Coevolution of Artificial Neural Network Ensembles for Pattern Classification,"
IEEE Trans. Evolutionary Computation, vol. 9, no. 3, pp. 271-302, June 2005.- [19] H. Chen and X. Yao, "Evolutionary Random Neural Ensemble Based on Negative Correlation Learning,"
Proc. IEEE Congress on Evolutionary Computation (CEC '07), pp. 1468-1474, 2007.- [20] H.H. Dam, H.A. Abbass, C. Lokan, and X. Yao, "Neural-Based Learning Classifier Systems,"
IEEE Trans. Knowledge and Data Eng., vol. 20, no. 1, pp. 26-39, Jan. 2008.- [21] H. Chen and X. Yao, "Regularized Negative Correlation Learning for Neural Network Ensembles,"
IEEE Trans. Neural Networks, vol. 20, no. 12, pp. 1962-1979, Dec. 2009.- [22] A. Krogh and J.A. Hertz, "A Simple Weight Decay can Improve Generalization,"
Advances in Neural Information Processing Systems, vol. 4, pp. 950-957, Morgan Kauffmann, 1992.- [23] A. Krogh and J. Vedelsby, "Neural Network Ensembles, Cross Validation, and Active Learning,"
Advances in Neural Information Processing Systems, vol. 7, pp. 231-238, Morgan Kauffmann, 1995.- [24] V.N. Vapnik,
The Nature of Statistical Learning Theory. Springer-Verlag, 1995.- [25] S. Geman, E. Bienenstock, and R. Doursat, "Neural Networks and the Bias/Variance Dilemma,"
Neural Computation, vol. 4, no. 1, pp. 1-58, 1992.- [26] D. Goldberg, K. Deb, H. Kargupta, and G. Harik, "Rapid, Accurate Optimization of Difficult Problems Using Fast Messy Genetic Algorithms,"
Proc. Fifth Int'l Conf. Genetic Algorithms, pp. 56-64, 1993.- [27] E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, and V.G. da Fonseca, "Performance Assessment of Multiobjective Optimizers: An Analysis and Review,"
IEEE Trans. Evolutionary Computation, vol. 7, no. 2, pp. 117-132, Apr. 2003.- [28] N. Srinivas and K. Deb, "Multiobjective Function Optimization Using Nondominated Sorting Genetic Algorithms,"
Evolutionary Computation, vol. 2, no. 3, pp. 221-248, 1995.- [29] P. Darwen and X. Yao, "Every Niching Method Has Its Niche: Fitness Sharing and Implicit Sharing Compared,"
Proc. Parallel Problem Solving from Nature (PPSN) IV, pp. 398-407, 1996.- [30] J. Baker, "Adaptive Selection Methods for Genetic Algorithms,"
Proc. Int'l Conf. Genetic Algorithms and Their Applications, pp. 100-111, 1985.- [31] B.D. Ripley,
Pattern Recognition and Neural Networks. Cambridge Univ. Press, 1996.- [32] R.B. Gramacy and H.K.H. Lee, "Gaussian Processes and Limiting Linear Models,"
Computational Statistics & Data Analysis, vol. 53, no. 1, pp. 123-136, Sept. 2008.- [33] U. Yule, "On the Association of Attributes in Statistics,"
Philosophical Trans. Royal Soc. London. Series A, vol. 194, pp. 257-319, 1900.- [34] A. Asuncion and D. Newman, "UCI Machine Learning Repository," http://www.ics.uci.edu/~mlearnMLRepository.html , 2007.
- [35] K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, "A Fast Elitist Non-Dominated Sorting Genetic Algorithm for Multi-Objective Optimization: NSGA-II,"
Parallel Problem Solving from Nature, vol. VI, pp. 849-858, Springer, 2000.- [36] X. Yao and Y. Liu, "Making Use of Population Information in Evolutionary Artificial Neural Networks,"
IEEE Trans. Systems, Man and Cybernetics, Part B: Cybernetics, vol. 28, no. 3, pp. 417-425, June 1998.- [37] H. Chen, P. Tiňo, and X. Yao, "Predictive Ensemble Pruning by Expectation Propagation,"
IEEE Trans. Knowledge and Data Eng., vol. 21, no. 7, pp. 999-1013, July 2009.- [38] J. Demšar, "Statistical Comparisons of Classifiers Over Multiple Data Sets,"
J. Machine Learning Research, vol. 7, pp. 1-30, 2006.- [39] M. Friedman, "The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance,"
J. Am. Statistical Assoc., vol. 32, pp. 675-701, 1937.- [40] R.L. Iman and J.M. Davenport, "Approximations of the Critical Region of the Friedman Statistic,"
Comm. Statistics, vol. 9, pp. 571-595, 1980.- [41] O.J. Dunn, "Multiple Comparisons Among Means,"
J. Am. Statistical Assoc., vol. 56, pp. 52-64, 1961.- [42] C. Igel and M. Hüsken, "Improving the Rprop Learning Algorithm,"
Proc. Second ICSC Int'l Symp. Neural Computation, pp. 115-121, 2000. |