This Article 
 Bibliographic References 
 Add to: 
Use of Contextual Information for Feature Ranking and Discretization
September-October 1997 (vol. 9 no. 5)
pp. 718-730

Abstract—Deriving classification rules or decision trees from examples is an important problem. When there are too many features, discarding weak features before the derivation process is highly desirable. When there are numeric features, they need to be discretized for the rule generation. We present a new approach to these problems. Traditional techniques make use of feature merits based on either the information theoretic, or the statistical correlation between each feature and the class. We instead assign merits to features by finding each feature's "obligation" to the class discrimination in the context of other features. The merits are then used to rank the features, select a feature subset, and discretize the numeric variables. Experience with benchmark example sets demonstrates that the new approach is a powerful alternative to the traditional methods. This paper concludes by posing some new technical issues that arise from this approach.

[1] P. Clark and T. Niblett, "The CN2 Induction Algorithm," Machine Learning, vol. 3, pp. 261-283, 1989.
[2] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[3] S. Weiss and N. Indurkhya, "Optimized Rule Induction," IEEE Expert, vol. 8, no. 6, pp. 61-69, Dec. 1993.
[4] C. Apté, F. Damerau, and S. Weiss, "Automated Learning of Decison Rules for Text Categorization," Technical Report No. RC 18879, IBM Thomas J. Watson Research Center, 1994; also appears in ACM Trans. Office Information Systems, July 1995.
[5] S. Weiss, C. Apté, and G. Grout, "Predicting Defects in Disk Drive Manufacturing: A Case Study in High-Dimensional Classification," Proc. IEEE CAIA, pp. 212-218, 1993.
[6] S. Murthy, S. Kasif, S. Salzberg, and R. Beigel, "OC1: Randomized Induction of Oblique Decision Trees," Proc. AAAI, pp. 322-327, 1993.
[7] C. Apté, S. Hong, S. Prasad, and B. Rosen, "RAMP: Rule Abstraction for Modeling and Prediction," IBM Technical Report RC-20271, 1995.
[8] K. Kira and L. Rendell, "The Feature Selection Problem: Traditional Methods and a New Algorithm," Proc. AAAI, pp. 129-134, 1992.
[9] I. Kononenko, E. Simec, and M. Robnik, "Overcoming the Myopia of Inductive Learning Algorithms with ReliefF," Applied Intelligence, vol. 7, pp. 39-55, 1997.
[10] P. Agarwal and P. Raghavan, "On Building Small Decision Trees for Geometric Classification," private communication.
[11] H. Almuallim and T. Dietterich, "Learning with Many Irrelevant Features," Proc. AAAI, pp. 547-558, 1991.
[12] M. Kudo and M. Shimbo, "Feature Selection Based on the Structural Indices of Categories," Pattern Recognition, vol. 26, no. 6, pp. 891-902, 1993.
[13] N. Shan, W. Ziarko, H. Hamilton, and N. Cercone, "Using Rough Sets as Tools for Knowledge Discovery," Proc. KDD, pp. 263-268, 1995.
[14] C. Hartmann, P. Varshney, K. Mehrotra, and C. Gerberich, "Application of Information Theory to the Construction of Decision Trees," IEEE Trans. Information Theory, July 1982.
[15] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees.Monterey, Calif.: Wadsworth, 1984.
[16] U. Fayyad and K. Irani, "The Attribute Selection Problem in Decision Tree Generation," Proc. AAAI, pp. 104-110, 1992.
[17] R.E. Bryant, "Graph-Based Algorithms for Boolean Function Manipulation," IEEE Trans. Computers, Vol. C-35, No. 8, Aug. 1986, pp. 667-690.
[18] S. Hong, "Developing Classification Rules (Trees) from Examples," tutorial notes, IEEE CAIA 92 and 93.
[19] S. Weiss and C. Kulikowski, Computer Systems That Learn. Morgan Kaufmann, 1991.
[20] J.R. Quinlan,“Simplifying decision trees,” Int’l J. Man-Machine Studies, vol. 27, pp. 221-234, 1987.
[21] S.J. Hong, "R-Mini: An Iterative Approach for Generating Minimal Rules From Examples," IEEE Trans. Knowledge and Data Engineering, vol. 9, pp. 709-717, 1997.
[22] A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth, "Learnability and the Vapnik-Chervonenkis Dimension," J. ACM, vol. 36, pp. 929-965, 1989.
[23] S. Hong, J. Hosking, and S. Winograd, "Use of Randomization to Normalize Feature Merits," Proc. ISIS 96, pp. 10-19, 1996.
[24] J. Dougherty, R. Kohavi, and M. Sahami, "Supervised and Unsupervised Discretization of Continuous Features," Proc. ML, 1995.
[25] J. Friedman, "Exploratory Projection Pursuit," J. Am. Statistical Assoc., pp. 249-266, Mar. 1987.
[26] R. Kerber, "ChiMerge: Discretization of Numeric Attributes," Proc. AAAI, pp. 123-128, 1992.
[27] A. Aggarwal and T. Tokuyama, "Consecutive Interval Query and Dynamic Programming on Intervals," Proc. Fourth Int'l Symp. Algorithms and Computation, Dec. 1993,
[28] A. Asano, "Dynamic Programming on Intervals," Proc. Second Int'l Symp. Algorithms, Lecture Notes in Computing Science series, vol. 557, pp. 199-207, Springer-Verlag, 1991.
[29] C. Apté and S. Hong, "Predicting Equity Returns from Securities Data with Minimal Rule Generation," Advances in Knowledge Discovery, AAAI Press, 1995.
[30] Machine Learning, Neural, and Statistical Classification, D. Michie, D. Speigelhalter, and C. Taylor, eds., Ellis Horwood, 1994.

Index Terms:
Feature analysis, classification modeling, discretization, feature merit, feature selection.
Se June Hong, "Use of Contextual Information for Feature Ranking and Discretization," IEEE Transactions on Knowledge and Data Engineering, vol. 9, no. 5, pp. 718-730, Sept.-Oct. 1997, doi:10.1109/69.634751
Usage of this product signifies your acceptance of the Terms of Use.