This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction
August 1991 (vol. 13 no. 8)
pp. 834-841

The authors present a statistical-heuristic feature selection criterion for constructing multibranching decision trees in noisy real-world domains. Real world problems often have multivalued features. To these problems, multibranching decision trees provide a more efficient and more comprehensible solution that binary decision trees. The authors propose a statistical-heuristic criterion, the symmetrical tau and then discuss its consistency with a Bayesian classifier and its built-in statistical test. The combination of a measure of proportional-reduction-in-error and cost-of-complexity heuristic enables the symmetrical tau to be a powerful criterion with many merits, including robustness to noise, fairness to multivalued features, and ability to handle a Boolean combination of logical features, and middle-cut preference. The tau criterion also provides a natural basis for prepruning and dynamic error estimation. Illustrative examples are also presented.

[1] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone,Classification and Regression Trees. Belmont, CA: Wadsworth, 1984.
[2] R. O. Duda and P. E. Hart,Pattern Classification and Scene Analysis. New York: Wiley, 1973.
[3] R. A. Fisher,Statistical Methods for Research Workers13th ed. Edinburgh: Oliver and Boyd, 1958.
[4] J. H. Friedman, "A recursive partioning decision rule for nonparametric classification,"IEEE Trans. Comput., vol. C-26, pp. 404-408, Apr. 1977.
[5] L. A. Goodman and W. H. Kruskal, "Measures of association for cross-classifications,"J. Amer, Stat. Assoc., vol. 49, pp. 732-764, 1954.
[6] A. E. Hart, "Experience in the use of an inductive system in knowledge engineering," inR.&D. in Expert Systems, M. A. Bramer, Ed. London: Cambridge University Press, 1985.
[7] C. R. P. Hartmann, P. K. Varshney, K. G. Mehrotra, and C. L. Gerberich, "Application of information theory to the construction of efficient decision trees,"IEEE Trans. Inform. Theory, vol. IT-28, pp. 565-577, July 1982.
[8] E. G. Henrichon and K. S. Fu, "A nonparametric partitioning procedure for pattern classification,"IEEE Trans. Comput., vol. C-18, pp. 614-624, July 1969.
[9] L. Hyafil and R. L. Rivest, "Constructing optimal binary decision trees is NP-complete,"Inform. Processing Lett., vol. 5, no. 1, pp. 15-17, 1976.
[10] L. N. Kanal, "Prolem-solving models and search strategies for pattern recognition,"IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-1, pp. 194-201, Apr. 1979.
[11] I. Kononenko, I. Bratko, and E. Roskar, "Experiments in automatic learning of medical diagnostic rules," Jozef Stefan Inst., Yugoslavia, Tech. Rep. 1984.
[12] R. J. Light and B. H. Margolin, "An analysis of variance for categorical data,"J. Amer. Stat. Assoc., vol. 66, pp. 534-544, 1971.
[13] B. H. Margolin and R. J. Light, "An analysis of variance for categorical data, II,"J. Amer. Stat. Assoc., vol. 69, pp. 755-764, 1974.
[14] J. Mingers, "Expert systems--Rule induction with statistical data,"J. Opl. Res. Soc., vol. 38, no. 1, pp. 39-47, 1987.
[15] M. Miyakawa, "Criteria for selecting a variable in the construction of efficient decision trees,"IEEE Trans. Comput., vol. 38, Jan. 1989.
[16] B. M. E. Moret, "Decision trees and diagrams,"Comput. Surveys, vol. 14, no. 4, pp. 593-623, 1982.
[17] J. N. Morgan and R. C. Messenger, "THAID : A sequential search program for the analysis of nominal scale dependent variables," Inst. Social Res., Univ. Michigan, Ann Arbor, 1973.
[18] J. K. Mui and K. S. Fu, "Automated classification of nucleated blood cells using a binary tree classifier,"IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-2, pp. 429-443, Sept. 1980.
[19] H. J. Payne and W. S. Meisel, "An algorithm for constructing optimal binary decision trees,"IEEE Trans. Comput.vol. C-26, pp. 905-916, Sept. 1977.
[20] J. R. Quinlan, "Induction of decision trees,"Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[21] J. R. Quinlan, "The effect of noise on concept learning," inMachine Learning, vol. II, R.S. Michalskiet al., Eds. Los Altos, CA: Morgan Kaufmann, 1986.
[22] J. Quinlan, "Simplifying decision trees,"Int. J. Man-Machine Studies, vol. 27, pp. 221-234, 1987.
[23] J. R. Quinlan, "Decision trees and multi-valued attributes," inMachine Intelligence 11, J. E. Hayes, D. Michie, and J. Richards, Eds. Oxford: Oxford University Press, 1988.
[24] E. M. Rounds, "A combined nonparametric approach to feature selection and binary decision tree design,"Patt. Recog., vol. 12, pp. 313-317, 1980.
[25] I. K. Sethi and G. P. R. Saravarayudu, "Hierarchical classifier design using mutual information,"IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-4, pp. 441-445, July 1982.
[26] P. H. Swain and H. Hauska, "The decision tree classifier: Design and potential,"IEEE Trans. Geosci. Electron., vol. GE-15, pp. 142-147, July 1977.
[27] Q. R. Wang and C. Y. Suen, "Analysis and design of a decision tree based on entroy reduction and its application to large character set recognition,"IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-6, pp. 406-417, July 1984.
[28] X. J. Zhou and T. S. Dillon, "A heuristic-statistical feature selection criterion for inductive machine learning in the real world," inProc. 1988 IEEE Int. Conf. Syst. Man Cybern.(Beijing, China), 1988.
[29] X. J. Zhou and T. S. Dillon, "Combining artificial intelligence with statistical methods for machine learning in the real world," inProc. 2nd Int. Workshop Artificial Intell. Stat.(Fort Lauderdale, FL), 1989.
[30] X. J. Zhou and T. S. Dillon, "A comparative study of heuristics for the construction of decision trees," to be published.

Index Terms:
pattern recognition; statistical-heuristic feature selection criterion; decision tree induction; multibranching decision trees; Bayesian classifier; built-in statistical test; proportional-reduction-in-error; cost-of-complexity heuristic; robustness; middle-cut preference; tau criterion; prepruning; dynamic error estimation; Bayes methods; decision theory; pattern recognition; statistics; trees (mathematics)
Citation:
X.J. Zhou, T.S. Dillon, "A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 834-841, Aug. 1991, doi:10.1109/34.85676
Usage of this product signifies your acceptance of the Terms of Use.