Subscribe
Issue No.03 - March (2013 vol.25)
pp: 494-501
Ludmila I. Kuncheva , Bangor University, Bangor
ABSTRACT
Kappa-error diagrams are used to gain insights about why an ensemble method is better than another on a given data set. A point on the diagram corresponds to a pair of classifiers. The x-axis is the pairwise diversity (kappa), and the y-axis is the averaged individual error. In this study, kappa is calculated from the 2\times2 correct/wrong contingency matrix. We derive a lower bound on kappa which determines the feasible part of the kappa-error diagram. Simulations and experiments with real data show that there is unoccupied feasible space on the diagram corresponding to (hypothetical) better ensembles, and that individual accuracy is the leading factor in improving the ensemble accuracy.
INDEX TERMS
Classificagtion, Diversity methods, Image color analysis, Decision trees, Mathematical model, Feature extraction, Kappa-error diagrams, limits, Classifier ensembles, kappa-error diagrams, ensemble diversity
CITATION
Ludmila I. Kuncheva, "A Bound on Kappa-Error Diagrams for Analysis of Classifier Ensembles", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 3, pp. 494-501, March 2013, doi:10.1109/TKDE.2011.234
REFERENCES
[1] B.H.G. Barbosa, L.T. Bui, H. Abbass, L. Aguirre, and A.P. Braga, "The Use of Coevolution and the Artificial Immune System for Ensemble Learning," Soft Computing, vol. 15, no. 9, pp. 1735-1747, June 2011.
[2] E. Bauer and R. Kohavi, "An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants," Machine Learning, vol. 36, pp. 105-142, 1999.
[3] L. Breiman, "Bagging Predictors," Machine Learning, vol. 26, no. 2, pp. 123-140, 1996.
[4] L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001.
[5] G. Brown, J. Wyatt, R. Harris, and X. Yao, "Diversity Creation Methods: A Survey and Categorisation," Information Fusion, vol. 6, no. 1, pp. 5-20, 2005.
[6] G. Brown, "Ensemble Learning," Encyclopedia of Machine Learning, C. Sammut and G. Webb, ed., Springer Verlag, 2010.
[7] T.G. Dietterich, "Ensemble Methods in Machine Learning," Proc. Int'l Workshop Multiple Classifier Systems, J. Kittler and F. Roli, ed., pp. 1-15, 2000.
[8] J.L. Fleiss, Statistical Methods for Rates and Proportions. John Wiley & Sons, 1981.
[9] A. Frank and A. Asuncion, "UCI Machine Learning Repository," 2010.
[10] Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[11] L. Gupta, S. Kota, D.L. Molfese, and R. Vaidyanathan, "Diversity-Based Selection of Components for Fusion Classifiers," Proc. Ann. Int'l Conf. IEEE Eng. in Medicine and Biology Soc., vol. 1, pp. 6304-6307, 2010.
[12] T.K. Ho, "The Random Subspace Method for Constructing Decision Forests," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832-844, Aug. 1998.
[13] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. John Wiley and Sons, 2004.
[14] L.I. Kuncheva and C.J. Whitaker, "Measures of Diversity in Classifier Ensembles," Machine Learning, vol. 51, pp. 181-207, 2003.
[15] L.I. Kuncheva and J.J. Rodríguez, "Classifier Ensembles with a Random Linear Oracle," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 4, pp. 500-508, 2007.
[16] L.I. Kuncheva and C.J. Whitaker, "Ten Measures of Diversity in Classifier Ensembles: Limits for Two Classifiers," Proc. IEE Workshop Intelligent Sensor Processing, pp. 10/1-10/6, 2001.
[17] D.D. Margineantu and T.G. Dietterich, "Pruning Adaptive Boosting," Proc. 14th Int'l Conf. Machine Learning, pp. 378-387, 1997.
[18] J. Meynet and J.-P. Thiran, "Information Theoretic Combination of Pattern Classifiers," Pattern Recognition, vol. 43, no. 10, pp. 3412-3421, 2010.
[19] R. Polikar, "Ensemble Based Systems in Decision Making," IEEE Circuits and Systems Magazine, vol. 6, no. 3, pp. 21-45, third quarter 2006.
[20] J.J. Rodríguez, L.I. Kuncheva, and C.J. Alonso, "Rotation Forest: A New Classifier Ensemble Method," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, Oct. 2006.
[21] L. Rokach, "Taxonomy for Characterizing Ensemble Methods in Classification Tasks: A Review and Annotated Bibliography," Computational Statistics & Data Analysis, vol. 53, no. 12, pp. 4046-4072, Oct. 2009.
[22] L. Rokach, "Collective-Agreement-Based Pruning of Ensembles," Computational Statistics & Data Analysis, vol. 53, no. 4, pp. 1015-1026, 2009.
[23] L. Rokach, "Ensemble-Based Classifiers," Artificial Intelligence Rev., vol. 33, pp. 1-39, Feb. 2010.
[24] Proc. Int'l Workshops Multiple Classifier Systems, F. Roli, J. Kittler, T. Windeatt, N. Oza, R. Polikar, M. Haindl, J.A. Benediktsson, N. El-Gayar, and C. Sansone, eds., Lecture Notes in Computer Science series, Springer-Verlag, 2000-2012.
[25] E.K. Tang, P.N. Suganthan, and X. Yao, "An Analysis of Diversity Measures," Machine Learning, vol. 65, no. 1, pp. 247-271, 2006.
[26] G. Valentini and M. Re, "Ensemble Methods: A Review," Advances in Machine Learning and Data Mining for Astronomy, Data Mining and Knowledge Discovery, M.J Way, J.D. Scargle, K.M. Ali, and A.N. Srivastava, eds. Chapman & Hall/CRC Press, 2012.