Subscribe

Issue No.07 - July (2009 vol.21)

pp: 999-1013

Huanhuan Chen , University of Birmingham, Birmingham

Peter Tiňo , University of Birmingham, Birmingham

Xin Yao , University of Birmingham, Birmingham

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.62

ABSTRACT

An ensemble is a group of learners that work together as a committee to solve a problem. The existing ensemble learning algorithms often generate unnecessarily large ensembles, which consume extra computational resource and may degrade the generalization performance. Ensemble pruning algorithms aim to find a good subset of ensemble members to constitute a small ensemble, which saves the computational resource and performs as well as, or better than, the unpruned ensemble. This paper introduces a probabilistic ensemble pruning algorithm by choosing a set of “sparse” combination weights, most of which are zeros, to prune the ensemble. In order to obtain the set of sparse combination weights and satisfy the nonnegative constraint of the combination weights, a left-truncated, nonnegative, Gaussian prior is adopted over every combination weight. Expectation propagation (EP) algorithm is employed to approximate the posterior estimation of the weight vector. The leave-one-out (LOO) error can be obtained as a by-product in the training of EP without extra computation and is a good indication for the generalization error. Therefore, the LOO error is used together with the Bayesian evidence for model selection in this algorithm. An empirical study on several regression and classification benchmark data sets shows that our algorithm utilizes far less component learners but performs as well as, or better than, the unpruned ensemble. Our results are very competitive compared with other ensemble pruning algorithms.

INDEX TERMS

Machine learning, probabilistic algorithms, ensemble learning, regression, classification.

CITATION

Huanhuan Chen, Peter Tiňo, Xin Yao, "Predictive Ensemble Pruning by Expectation Propagation",

*IEEE Transactions on Knowledge & Data Engineering*, vol.21, no. 7, pp. 999-1013, July 2009, doi:10.1109/TKDE.2009.62REFERENCES

- [2] L. Breiman, “Bagging Predictors,”
Machine Learning, vol. 24, no. 2, pp.123-140, 1996.- [3] R.E. Schapire, “A Brief Introduction to Boosting,”
Proc. 16th Int'l Joint Conf. Artificial Intelligence, pp.1401-1406, 1999.- [4] L. Breiman, “Arcing Classifier,”
Annals of Statistics, vol. 26, no. 3, pp.801-849, 1998.- [7] D. Zhang, S. Chen , Z. Zhou, and Q. Yang, “Constraint Projections for Ensemble Learning,”
Proc. 23rd AAAI Conf. Artificial Intelligence (AAAI '08), pp.758-763, 2008.- [12] T.G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization,”
Machine Learning, vol. 40, no. 2, pp.139-157, 2000.- [13] D.D. Margineantu and T.G. Dietterich, “Pruning Adaptive Boosting,”
Proc. 14th Int'l Conf. Machine Learning, pp.211-218, 1997.- [17] L. Breiman, “Stacked Regressions,”
Machine Learning, vol. 24, no. 1, pp.49-64, 1996.- [18] S. Hashem, “Optimal Linear Combinations of Neural Networks,” PhD dissertation, Purdue Univ. 1993.
- [20] T.P. Minka, “Expectation Propagation for Approximate Bayesian Inference,”
Proc. 17th Conf. Uncertainty in Artificial Intelligence (UAI'01), pp.362-369, 2001.- [21] N.V. Chawla, L.O. Hall, K.W. Bowyer, and W.P. Kegelmeyer, “Learning Ensembles from Bites: A Scalable and Accurate Approach,”
J. Machine Learning Research, vol. 5, pp.421-451, 2004.- [22] A. Prodromidis and P. Chan, “Meta-Learning in a Distributed Data Mining System: Issues and Approaches,”
Proc. 14th Int'l Conf. Machine Learning, pp.211-218, 1998.- [23] Y. Zhang, S. Burer, and W.N. Street, “Ensemble Pruning via Semi-definite Programming,”
J. Machine Learning Research, vol. 7, pp.1315-1338, 2006.- [27] A. Demiriz, K.P. Bennett, and J. Shawe-Taylor, “Linear Programming Boosting via Column Generation,”
Machine Learning, vol. 46, nos.1-3, pp.225-254, 2002.- [29] A. Faul and M. Tipping, “Analysis of Sparse Bayesian Learning,”
Advances in Neural Information Processing Systems, vol. 14, pp.383-389, 2002.- [30] Y. Qi, T.P. Minka, R.W. Picard, and Z. Ghahramani, “Predictive Automatic Relevance Determination by Expectation Propagation,”
Proc. 21st Int'l Conf. Machine Learning (ICML '04), p.85, 2004.- [31] C. Andrieu, N.d. Freitas, A. Doucet, and M.I. Jordan, “An Introduction to MCMC for Machine Learning,”
Machine Learning, vol. 50, nos.1/2, pp.5-43, 2003.- [32] J.V. Hansen, “Combining Predictors: Meta Machine Learning Methods and Bias/Variance and Ambiguity Decompositions,” PhD dissertation, Dept. of Computer Science, Univ. of Aarhus, 2000.
- [33] G. Ridgeway, D. Madigan, and T. Richardson, “Boosting Methodology for Regression Problems,”
Proc. Artificial Intelligence and Statistics, pp.152-161, 1999.- [34] A. Asuncion and D. Newman, “UCI Machine Learning Repository,” http://mlearn.ics.uci.eduMLRepository.html , 2007.
- [35] D. Opitz and R. Maclin, “Popular Ensemble Methods: An Empirical Study,”
J. Artificial Intelligence Research, vol. 11, pp.169-198, 1999.- [36] J. Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets,”
J. Machine Learning Research, vol. 7, pp.1-30, 2006. |