This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Constrained Cascade Generalization of Decision Trees
June 2004 (vol. 16 no. 6)
pp. 727-739
Sudha Ram, IEEE

Abstract—While decision tree techniques have been widely used in classification applications, a shortcoming of many decision tree inducers is that they do not learn intermediate concepts, i.e., at each node, only one of the original features is involved in the branching decision. Combining other classification methods, which learn intermediate concepts, with decision tree inducers can produce more flexible decision boundaries that separate different classes, potentially improving classification accuracy. We propose a generic algorithm for cascade generalization of decision tree inducers with the maximum cascading depth as a parameter to constrain the degree of cascading. Cascading methods proposed in the past, i.e., loose coupling and tight coupling, are strictly special cases of this new algorithm. We have empirically evaluated the proposed algorithm using logistic regression and C4.5 as base inducers on 32 UCI data sets and found that neither loose coupling nor tight coupling is always the best cascading strategy and that the maximum cascading depth in the proposed algorithm can be tuned for better classification accuracy. We have also empirically compared the proposed algorithm and ensemble methods such as bagging and boosting and found that the proposed algorithm performs marginally better than bagging and boosting on the average.

[1] E. Bauer and R. Kohavi, An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants Machine Learning, vol. 36, nos. 1-2, pp. 105-139, 1999.
[2] C.L. Blake and C.J. Merz, UCI Repository of Machine Learning Databases,http://www.ics.uci.edu/~mlearnMLRepository .html , 1998.
[3] J.C. Bioch, O. van der Meer, and R. Potharst, Bivariate Decision Trees Principles of Data Mining and Knowledge Discovery, Lecture Notes in Artificial Intelligence 1263, J. Komorowski and J. Zytkow, eds., Springer Verlag, pp. 232-243, 1997.
[4] L. Breiman, Bagging Predictors Machine Learning, vol. 24, no. 2, pp.123-140, 1996.
[5] C.E. Brodley and P.E. Utgoff, Multivariate Decision Trees Machine Learning, vol. 19, no. 1, pp. 45-77, 1995.
[6] T.G. Dietterich, Ensemble Methods in Machine Learning Proc. First Int'l Workshop Multiple Classifier Systems, pp. 1-15, 2000.
[7] T.G. Dietterich, An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization Machine Learning, vol. 40, no. 2, pp. 139-157, 2000.
[8] P. Domingos, A Unified Bias-Variance Decomposition and its Applications Proc. 17th Int'l Conf. Machine Learning, pp. 231-238, 2000.
[9] I.P. Fellegi and A.B. Sunter, A Theory of Record Linkage J. American Statistical Assoc., vol. 64, pp. 1183-1210, 1969.
[10] Y. Freund and R.E. Schapire, Experiments with a New Boosting Algorithm Proc. 13th Int'l Conf. Machine Learning, pp. 148-156, 1996.
[11] J. Gama, Discriminant Trees Proc. 16th Int'l Conf. Machine Learning, pp. 134-142, 1999.
[12] J. Gama and P. Brazdil, Cascade Generalization Machine Learning, vol. 41, no. 3, pp. 315-343, 2000.
[13] S. Geman, E. Bienenstock, and R. Doursat, Neural Networks and the Bias/Variance Dilemma Neural Computation, vol. 4, pp. 1-48, 1992.
[14] D. Heath, S. Kasif, and S. Salzberg, Induction of Oblique Decision Trees Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1002-1007, 1993.
[15] D.W. Hosmer and S. Lemeshow, Applied Logistic Regression, second ed. John Wiley&Sons, Inc., 2000.
[16] G.H. John, Robust Linear Discriminant Trees Learning From Data: Artificial Intelligence and Statistics V, Lecture Notes in Statistics, D. Fisher and H. Lenz, eds., Springer-Verlag, pp. 375-385, 1996.
[17] R. Kohavi, A Study of Cross-validation and Bootstrap for Accuracy Estimation and Model Selection Proc. 14th Int'l Joint Conf. Artificial Intelligence, pp. 1137-1143, 1995.
[18] R. Kohavi and D.H. Wolpert, Bias Plus Variance Decomposition for Zero-One Loss Functions Proc. 13th Int'l Conf. Machine Learning, pp. 275-283, 1996.
[19] S.K. Murthy, Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey Data Mining and Knowledge Discovery, vol. 2, no. 4, pp. 345-389, 1998.
[20] S.K. Murthy, S. Kasif, and S. Salzberg, A System for Induction of Oblique Decision Trees J. Artificial Intelligence Research, vol. 2, pp. 1-32, 1994.
[21] J.R. Quinlan, Discovering Rules by Induction from Large Collections of Examples Expert Systems in the Micro Electronic Age, D. Michie, ed., Edinburgh Univ. Press, 1979.
[22] J.R. Quinlan, Induction of Decision Trees Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[23] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[24] S. Ram and H. Zhao, Detecting Both Schema-Level and Instance-Level Correspondences for the Integration of E-Catalogs Proc. 11th Ann. Workshop Information Technology and Systems (WITS '01), pp. 193-198, 2001.
[25] S.M. Weiss and C.A. Kulikowski, Computer Systems That Learn Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert System. Morgan Kaufmann, 1991.
[26] J. Wickramaratna, S. Holden, and B. Buxton, Performance Degradation in Boosting Proc. Second Int'l Workshop Multiple Classifier Systems, pp. 11-21, 2001.
[27] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, 2000.
[28] D.H. Wolpert, The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework The Mathematics of Generalization Proc. SFI/CNLS Workshop Formal Approaches to Supervised Learning, pp. 117-214, D.H. Wolpert, ed., Addison-Wesley, 1994.
[29] O.T. Yildiz and E. Alpaydin, Linear Discriminant Trees Proc. 17th Int'l Conf. Machine Learning, pp. 1175-1182, 2000.
[30] H. Zhao and S. Ram, Entity Identification for Heterogeneous Database Integration A Multiple Classifier System Approach and Empirical Evaluation Information Systems, 2004.

Index Terms:
Machine learning, data mining, classification, decision tree, cascade generalization.
Citation:
Huimin Zhao, Sudha Ram, "Constrained Cascade Generalization of Decision Trees," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 6, pp. 727-739, June 2004, doi:10.1109/TKDE.2004.3
Usage of this product signifies your acceptance of the Terms of Use.