Issue No. 01 - January (2007 vol. 19)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2007.2
QingXiang Wu , IEEE
David A. Bell , IEEE
Girijesh Prasad , IEEE
Thomas Martin McGinnity , IEEE
When symbolic AI approaches are applied to handle continuous valued attributes, there is a requirement to transform the continuous attribute values to symbolic data. In this paper, a novel distribution-index-based discretizer is proposed for such a transformation. Based on definitions of dichotomic entropy and a compound distributional index, a simple criterion is applied to discretize continuous attributes adaptively. The dichotomic entropy indicates the homogeneity degree of the decision value distribution, and is applied to determine the best splitting point. The compound distributional index combines both the homogeneity degrees of attribute value distributions and the decision value distribution, and is applied to determine which interval should be split further; thus, a potentially improved solution of the discretization problem can be found efficiently. Based on multiple reducts in rough set theory, a multiknowledge approach can attain high decision accuracy for information systems with a large number of attributes and missing values. In this paper, our discretizer is combined with the multiknowledge approach to further improve decision accuracy for information systems with continuous attributes. Experimental results on benchmark data sets show that the new discretizer can improve not only the multiknowledge approach, but also the naïve Bayes classifier and the C5.0 tree.
Data mining, machine learning, information theory, decision support.
G. Prasad, Q. Wu, T. M. McGinnity and D. A. Bell, "A Distribution-Index-Based Discretizer for Decision-Making with Symbolic AI Approaches," in IEEE Transactions on Knowledge & Data Engineering, vol. 19, no. , pp. 17-28, 2007.