This Article 
 Bibliographic References 
 Add to: 
Induction By Attribute Elimination
September/October 1999 (vol. 11 no. 5)
pp. 805-812

Abstract—In most data-mining applications where induction is used as the primary tool for knowledge extraction from real-world databases, it is difficult to precisely identify a complete set of relevant attributes. This paper introduces a new rule induction algorithm called Rule Induction Two In One (RITIO), which eliminates attributes in the order of decreasing irrelevancy. Like ID3-like decision tree construction algorithms, RITIO makes use of the entropy measure as a means of constraining the hypothesis search space; but, unlike ID3-like algorithms, the hypotheses language is the rule structure and RITIO generates rules without constructing decision trees. The final concept description produced by RITIO is shown to be largely based on only the most relevant attributes. Experimental results confirm that, even on noisy, industrial databases, RITIO achieves high levels of predictive accuracy.

[1] K.M. Ali and M.J. Pazzani, “Reducing the Small Disjuncts Problem by Learning Probabilistic Concept Descriptions,” Computational Learning Theory and Natural Learning Systems, T. Petsche et al., eds., vol. 3, 1992.
[2] P. Clark and R. Boswell,“Rule induction with CN2: Some recent improvements,” Machine Learning—EWSL-91, Y. Kodratoff, ed., pp. 151-163,Berlin: Springer Verlag, 1991.
[3] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and Unsupervised Discretization of Continuous Features,” Proc. 12th Int'l Conf. Machine Learning, pp. 194–202, 1995.
[4] M. Gams, M. Drobnic, and M. Petkovsek, “Learning from Examples—A Uniform View,” Int'l J. Man-Machine Studies, vol. 34, pp. 49–68, 1991.
[5] J. Hong, “AE1: An Extension Matrix Approximate Method for the General Covering Problem,” Int'l J. Computer and Information Sciences, vol. 14, no. 6, pp. 421–437, 1985.
[6] R.S. Michalski, I. Mozetic, J. Hong, and N. Lavrac, “The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains,” Proc. Fifth Nat'l Conf. Artificial Intelligence, pp. 1,041–1,045, 1986.
[7] R.S. Michalski, “Variable-Valued Logic and Its Applications to Pattern Recognition and Machine Learning,” Computer Science and Multiple-Valued Logic Theory and Applications, D.C. Rine, ed., pp. 506–534. Amsterdam: North-Holland, 1975.
[8] P.M. Murphy and D.W. Aha, “UCI Repository of Machine Learning Databases, Machine-Readable Data Repository,” Dept. of Information and Computer Science, Univ. of California, Irvine, Calif., 1995.
[9] C. Yang and S. Hasegawa, "FITNESS: Failure Immunization Technology for Network Services Survivability," Proc. IEEE GLOBECOM, pp. 1,549-1,554, 1988.
[10] G. Pagllo and D. Haussler, “Boolean Feature Discovery in Empirical Learning,” Machine Learning, pp. 71-99, vol. 5, 1990.
[11] J.R. Quinlan,"Induction of decision trees," Machine Learning, vol. 1, pp. 81-106, 1986.
[12] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[13] C.E. Shannon and W. Weaver, Math. Theory of Comm., Univ. of Illinois Press, Urbana, Ill., 1949.
[14] P.E. Utgoff, “Shift of Bias for Inductive Concept Learning,” Machine Learning: An AI Approach, pp. 107–148, vol. 2,chapter 5, Morgan Kaufmann, 1986.
[15] X. Wu, Knowledge Acquisition from Databases. Ablex, 1995.
[16] X. Wu and P. Måhlén, “Fuzzy Interpretation of Induction Results,” Proc. 1995 Int'l Conf. Knowledge Discovery and Data Mining (KDD-95), pp. 325–330, Montreal, Aug. 1995.
[17] X. Wu, J. Krisár, and P. Måhlén, “Noise Handling with Extension Matrices,” Int'l J. Artificial Intelligence Tools, vol. 5, no. 1, pp. 81–97, 1996.

Index Terms:
Data mining, rule induction, attribute selection, information entropy.
Xindong Wu, David Urpani, "Induction By Attribute Elimination," IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 5, pp. 805-812, Sept.-Oct. 1999, doi:10.1109/69.806938
Usage of this product signifies your acceptance of the Terms of Use.