This Article 
 Bibliographic References 
 Add to: 
A Comparative Analysis of Methods for Pruning Decision Trees
May 1997 (vol. 19 no. 5)
pp. 476-491

Abstract—In this paper, we address the problem of retrospectively pruning decision trees induced from data, according to a top-down approach. This problem has received considerable attention in the areas of pattern recognition and machine learning, and many distinct methods have been proposed in literature. We make a comparative study of six well-known pruning methods with the aim of understanding their theoretical foundations, their computational complexity, and the strengths and weaknesses of their formulation. Comments on the characteristics of each method are empirically supported. In particular, a wide experimentation performed on several data sets leads us to opposite conclusions on the predictive accuracy of simplified trees from some drawn in the literature. We attribute this divergence to differences in experimental designs. Finally, we prove and make use of a property of the reduced error pruning method to obtain an objective evaluation of the tendency to overprune/underprune observed in each method.

[1] M. Bohanec and I. Bratko, "Trading Accuracy for Simplicity in Decision Trees," Machine Learning, vol. 15, no. 3, pp. 223-250, 1994.
[2] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees.Belmont, Calif.: Wadsworth Int'l, 1984.
[3] W.L. Buntine, A Theory of Learning Classification Rules, PhD Thesis, Univ. of Tech nology, Sydney, 1990.
[4] W.L. Buntine and T. Niblett, "A Further Comparison of Splitting Rules for Decision-Tree Induction," Machine Learning, vol. 8, no. 1, pp. 75-85, 1992.
[5] B. Cestnik, I. Kononenko, and I. Bratko, "ASSISTANT 86: A Knowledge-Elicitation Tool for Sophisticated Users," Progress in Machine Learning—Proc. EWSL-87, I. Bratko and N. Lavrac, eds. Wilmslow: Sigma Press, pp. 31-45, 1987.
[6] B. Cestnik and I. Bratko, "On Estimating Probabilities in Tree Pruning," Machine Learning: EWSL-91, Y. Kodratoff, ed., Lecture Notes in Artificial Intelligence. Berlin: Springer-Verlag, no. 482, pp. 138-150, 1991.
[7] E.S. Edgington, Randomization Tests, 2nd ed., New York, N.Y.: Marcel Dekker, 1987.
[8] B. Efron and G. Gong, "A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation," The American Statistician, vol. 37, pp. 36-48, 1983.
[9] F. Esposito, D. Malerba, and G. Semeraro, "Decision Tree Pruning as a Search in the State Space," Machine Learning: ECML-93, P. Brazdil, ed. Lecture Notes in Artificial Intelligence, Berlin: Springer-Verlag, no. 667, pp. 165-184, 1993.
[10] U. Fayyad and K.B. Irani, "The Attribute Selection Problem in Decision Tree Generation," Proc. AAAI-92, pp. 104-110, 1992.
[11] S. Gelfand,C. Ravishankar,, and E. Delp,“An iterative growing and pruning algorithm for classification tree design,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 2, pp. 163-174, Feb. 1991.
[12] R.C. Holte, "Very Simple Classification Rules Perform Well on Most Commonly Used Datasets," Machine Learning, vol. 11, no. 1, pp. 63-90, 1993.
[13] R.C. Holte, L.E. Acker, and B.W. Porter, "Concept Learning and the Problem of Small Disjuncts," Proc. 11th Int'l Joint Conf. on Artificial Intelligence, pp. 813-818, 1989.
[14] J. Kittler and P.A. Devijver, "Statistical Properties of Error Estimators in Performance Assessment of Recognition Systems," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 4, no. 2, pp. 215-220, 1982.
[15] R. Kohavi and G.H. John, "Automatic Parameter Selection by Minimizing Estimated Error," Proc. 12th Int'l Conf. on Machine Learning,Lake Tahoe, Calif., pp. 304-312, 1995.
[16] D. Malerba, G. Semeraro, and F. Esposito, "Choosing the Best Pruned Decision Tree: A Matter of Bias," Proc. Fifth Italian Workshop on Machine Learning,Parma, Italy, pp. 33-37, 1994.
[17] D. Malerba, F. Esposito, and G. Semeraro, "A Further Comparison of Simplification Methods for Decision-Tree Induction," Learning From Data: Artificial Intelligence and Statistics V, D. Fisher and H. Lenz, eds., Lecture Notes in Statistics. Berlin: Springer, no. 112, pp. 365-374, 1996.
[18] J. Mingers, "Expert Systems—Rule Induction With Statistical Data," J. Operational Research Society, vol. 38, pp. 39-47, 1987.
[19] J. Mingers, "An Empirical Comparison of Selection Measures for Decision-Tree Induction," Machine Learning, vol. 3, no. 4, pp. 319-342, 1989.
[20] J. Mingers, "An Empirical Comparison of Pruning Methods for Decision Tree Induction," Machine Learning, vol. 4, no. 2, pp. 227-243, 1989.
[21] P.M. Murphy and D.W. Aha, "UCI Repository of Machine Learning Databases [Machine Readable Data Repository]," Univ. of California, Dept. of Information and Computer Science, Irvine, Calif., 1996.
[22] T. Niblett, "Constructing Decision Trees in Noisy Domains," Progress in Machine Learning, Proc. EWSL 87, I. Bratko and N. Lavrac, eds. Wilmslow: Sigma Press, pp. 67-78, 1987.
[23] T. Niblett and I. Bratko, "Learning Decision Rules in Noisy Domains," Proc. Expert Systems 86,Cambridge: Cambridge University Press, 1986.
[24] J.R. Quinlan, "Induction of Decision Trees," Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[25] J.R. Quinlan, "Simplifying Decision Trees," Int'l J. Man-Machine Studies, vol. 27, pp. 221-234, 1987.
[26] J.R. Quinlan, C4.5: Programs for Machine Learning.San Mateo, Calif.: Morgan Kaufmann, 1993.
[27] J.R. Quinlan and L.R. Rivest, "Inferring Decision Trees Using the Minimum Description Length Principle," Information and Computation, vol. 80, pp. 227-248, 1989.
[28] J. Rissanen, "A Universal Prior for Integers and Estimation by Minimum Description Length," Annals of Statistics II, pp. 416-431, 1983.
[29] S.R. Safavian and D. Landgrebe, "A Survey of Decision Tree Classifier Methodology," IEEE Trans. Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660-674, 1991.
[30] C. Schaffer, "Deconstructing the Digit Recognition Problem," Proc. Ninth Int'l Workshop on Machine Learning, pp. 394-399.San Mateo, Calif.: Morgan Kaufmann, 1992.
[31] C. Schaffer, "Sparse Data and the Effect of Overfitting Avoidance in Decision Tree Induction," Proc. AAAI-92, pp. 147-152, 1992.
[32] C. Schaffer, "Overfitting Avoidance As Bias," Machine Learning, vol. 10, no. 2, pp.153-178, 1993.
[33] C.J.C.H. Watkins, "Combining Cross-Validation and Search," Progress in Machine Learning, Proc. EWSL 87, I. Bratko and N. Lavrac, eds. Wilmslow: Sigma Press, pp. 79-87, 1987.
[34] P.A. White and W.Z. Liu, "Bias in Information-Based Measures in Decision Tree Induction," Machine Learning, vol. 15, no. 3, pp. 321-329, 1994.

Index Terms:
Decision trees, top-down induction of decision trees, simplification of decision trees, pruning and grafting operators, optimal pruning, comparative studies.
Floriana Esposito, Donato Malerba, Giovanni Semeraro, "A Comparative Analysis of Methods for Pruning Decision Trees," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 5, pp. 476-491, May 1997, doi:10.1109/34.589207
Usage of this product signifies your acceptance of the Terms of Use.