The Community for Technology Leaders
RSS Icon
Issue No.07 - July (2009 vol.21)
pp: 999-1013
Huanhuan Chen , University of Birmingham, Birmingham
Peter Tiňo , University of Birmingham, Birmingham
Xin Yao , University of Birmingham, Birmingham
An ensemble is a group of learners that work together as a committee to solve a problem. The existing ensemble learning algorithms often generate unnecessarily large ensembles, which consume extra computational resource and may degrade the generalization performance. Ensemble pruning algorithms aim to find a good subset of ensemble members to constitute a small ensemble, which saves the computational resource and performs as well as, or better than, the unpruned ensemble. This paper introduces a probabilistic ensemble pruning algorithm by choosing a set of “sparse” combination weights, most of which are zeros, to prune the ensemble. In order to obtain the set of sparse combination weights and satisfy the nonnegative constraint of the combination weights, a left-truncated, nonnegative, Gaussian prior is adopted over every combination weight. Expectation propagation (EP) algorithm is employed to approximate the posterior estimation of the weight vector. The leave-one-out (LOO) error can be obtained as a by-product in the training of EP without extra computation and is a good indication for the generalization error. Therefore, the LOO error is used together with the Bayesian evidence for model selection in this algorithm. An empirical study on several regression and classification benchmark data sets shows that our algorithm utilizes far less component learners but performs as well as, or better than, the unpruned ensemble. Our results are very competitive compared with other ensemble pruning algorithms.
Machine learning, probabilistic algorithms, ensemble learning, regression, classification.
Huanhuan Chen, Peter Tiňo, Xin Yao, "Predictive Ensemble Pruning by Expectation Propagation", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 7, pp. 999-1013, July 2009, doi:10.1109/TKDE.2009.62
[1] L.K. Hansen and P. Salamon, “Neural Network Ensembles,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp.993-1001, Oct. 1990.
[2] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, no. 2, pp.123-140, 1996.
[3] R.E. Schapire, “A Brief Introduction to Boosting,” Proc. 16th Int'l Joint Conf. Artificial Intelligence, pp.1401-1406, 1999.
[4] L. Breiman, “Arcing Classifier,” Annals of Statistics, vol. 26, no. 3, pp.801-849, 1998.
[5] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp.5-32, 2001.
[6] J.J. Rodriguez, L.I. Kuncheva, and C.J. Alonso, “Rotation Forest: A New Classifier Ensemble Method,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp.1619-1630, Oct. 2006.
[7] D. Zhang, S. Chen , Z. Zhou, and Q. Yang, “Constraint Projections for Ensemble Learning,” Proc. 23rd AAAI Conf. Artificial Intelligence (AAAI '08), pp.758-763, 2008.
[8] Y. Liu and X. Yao, “Ensemble Learning via Negative Correlation,” Neural Networks, vol. 12, no. 10, pp.1399-1404, 1999.
[9] M.M. Islam, X. Yao, and K. Murase, “A Constructive Algorithm for Training Cooperative Neural Network Ensembles,” IEEE Trans. Neural Networks, vol. 14, no. 4, pp.820-834, 2003.
[10] X. Yao and Y. Liu, “Making Use of Population Information in Evolutionary Artificial Neural Networks,” IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 28, no. 3, pp.417-425, June 1998.
[11] Z. Zhou, J. Wu, and W. Tang, “Ensembling Neural Networks: Many Could Be Better Than All,” Artificial Intelligence, vol. 137, nos.1/2, pp.239-263, 2002.
[12] T.G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization,” Machine Learning, vol. 40, no. 2, pp.139-157, 2000.
[13] D.D. Margineantu and T.G. Dietterich, “Pruning Adaptive Boosting,” Proc. 14th Int'l Conf. Machine Learning, pp.211-218, 1997.
[14] R.E. Banfield, L.O. Hall, K.W. Bowyer, and W.P. Kegelmeyer, “Ensemble Diversity Measures and Their Application to Thinning,” Information Fusion, vol. 6, no. 1, pp.49-62, 2005.
[15] Y. Kim, W.N. Street, and F. Menczer, “Meta-Evolutionary Ensembles,” Proc. 2002 Int'l Joint Conf. Neural Networks, vol. 3, pp.2791-2796, 2002.
[16] H. Chen, P. Tino, and X. Yao, “A Probabilistic Ensemble Pruning Algorithm,” Proc. Sixth IEEE Int'l Conf. Data Mining Workshops Optimization-Based Data Mining Techniques with Applications, pp.878-882, 2006.
[17] L. Breiman, “Stacked Regressions,” Machine Learning, vol. 24, no. 1, pp.49-64, 1996.
[18] S. Hashem, “Optimal Linear Combinations of Neural Networks,” PhD dissertation, Purdue Univ. 1993.
[19] M. LeBlanc and R. Tibshirani, “Combining Estimates in Regression and Classification,” J. Am. Statistical Assoc., vol. 91, no. 436, pp.1641-1650, 1996.
[20] T.P. Minka, “Expectation Propagation for Approximate Bayesian Inference,” Proc. 17th Conf. Uncertainty in Artificial Intelligence (UAI'01), pp.362-369, 2001.
[21] N.V. Chawla, L.O. Hall, K.W. Bowyer, and W.P. Kegelmeyer, “Learning Ensembles from Bites: A Scalable and Accurate Approach,” J. Machine Learning Research, vol. 5, pp.421-451, 2004.
[22] A. Prodromidis and P. Chan, “Meta-Learning in a Distributed Data Mining System: Issues and Approaches,” Proc. 14th Int'l Conf. Machine Learning, pp.211-218, 1998.
[23] Y. Zhang, S. Burer, and W.N. Street, “Ensemble Pruning via Semi-definite Programming,” J. Machine Learning Research, vol. 7, pp.1315-1338, 2006.
[24] J.M. Bates and C.W.J. Granger, “The Combination of Forecasts,” Operations Research, vol. 20, pp.451-468, 1969.
[25] J.A. Benediktsson, J.R. Sveinsson, O.K. Ersoy, and P.H. Swain, “Parallel Consensual Neural Networks,” IEEE Trans. Neural Networks, vol. 8, no. 1, pp.54-64, Jan. 1997.
[26] N. Ueda, “Optimal Linear Combination of Neural Networks for Improving Classification Performance,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 2, pp.207-215, Feb. 2000.
[27] A. Demiriz, K.P. Bennett, and J. Shawe-Taylor, “Linear Programming Boosting via Column Generation,” Machine Learning, vol. 46, nos.1-3, pp.225-254, 2002.
[28] M.E. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” J. Machine Learning Research, vol. 1, pp.211-244, 2001.
[29] A. Faul and M. Tipping, “Analysis of Sparse Bayesian Learning,” Advances in Neural Information Processing Systems, vol. 14, pp.383-389, 2002.
[30] Y. Qi, T.P. Minka, R.W. Picard, and Z. Ghahramani, “Predictive Automatic Relevance Determination by Expectation Propagation,” Proc. 21st Int'l Conf. Machine Learning (ICML '04), p.85, 2004.
[31] C. Andrieu, N.d. Freitas, A. Doucet, and M.I. Jordan, “An Introduction to MCMC for Machine Learning,” Machine Learning, vol. 50, nos.1/2, pp.5-43, 2003.
[32] J.V. Hansen, “Combining Predictors: Meta Machine Learning Methods and Bias/Variance and Ambiguity Decompositions,” PhD dissertation, Dept. of Computer Science, Univ. of Aarhus, 2000.
[33] G. Ridgeway, D. Madigan, and T. Richardson, “Boosting Methodology for Regression Problems,” Proc. Artificial Intelligence and Statistics, pp.152-161, 1999.
[34] A. Asuncion and D. Newman, “UCI Machine Learning Repository,” http://mlearn.ics.uci.eduMLRepository.html , 2007.
[35] D. Opitz and R. Maclin, “Popular Ensemble Methods: An Empirical Study,” J. Artificial Intelligence Research, vol. 11, pp.169-198, 1999.
[36] J. Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” J. Machine Learning Research, vol. 7, pp.1-30, 2006.
[37] M. Friedman, “The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance,” J. Am. Statistical Assoc., vol. 32, pp.675-701, 1937.
[38] R.L. Iman and J.M. Davenport, “Approximations of the Critical Region of the Friedman Statistic,” Comm. Statistics, pp.571-595, 1980.
[39] O.J. Dunn, “Multiple Comparisons among Means,” J. Am. Statistical Assoc., vol. 56, pp.52-64, 1961.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool