This Article 
 Bibliographic References 
 Add to: 
A Distribution-Index-Based Discretizer for Decision-Making with Symbolic AI Approaches
January 2007 (vol. 19 no. 1)
pp. 17-28
When symbolic AI approaches are applied to handle continuous valued attributes, there is a requirement to transform the continuous attribute values to symbolic data. In this paper, a novel distribution-index-based discretizer is proposed for such a transformation. Based on definitions of dichotomic entropy and a compound distributional index, a simple criterion is applied to discretize continuous attributes adaptively. The dichotomic entropy indicates the homogeneity degree of the decision value distribution, and is applied to determine the best splitting point. The compound distributional index combines both the homogeneity degrees of attribute value distributions and the decision value distribution, and is applied to determine which interval should be split further; thus, a potentially improved solution of the discretization problem can be found efficiently. Based on multiple reducts in rough set theory, a multiknowledge approach can attain high decision accuracy for information systems with a large number of attributes and missing values. In this paper, our discretizer is combined with the multiknowledge approach to further improve decision accuracy for information systems with continuous attributes. Experimental results on benchmark data sets show that the new discretizer can improve not only the multiknowledge approach, but also the naïve Bayes classifier and the C5.0 tree.

[1] S. Zhang, C. Zhang, and X. Wu, Knowledge Discovery in Multiple Databases. Springer-Verlag, 2004.
[2] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and Unsupervised Discretization of Continuous Features,” Proc. Int'l Conf. Machine Learning, pp. 194-202, 1995.
[3] A.K. Lukasz and J.C. Krzysztof, “CAIM Discretization Algorithm,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 2, pp.145-153, Feb. 2004.
[4] U. Fayyad and K. Irani, “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning,” Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.
[5] D.A. Zighed, S. Rabaseda, and S. Rakotomala, “FUSINTER: A Method for Discretisation of Continuous Attributes,” Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 6, pp.307-326, 1998.
[6] R. Kerber, “Chimerge: Discretization of Numeric Attributes,” Proc. 10th Nat'l Conf. Artificial Intelligence, pp. 123-128, 1992.
[7] X. Wu, “A Bayesian Discretizer for Real-Valued Attributes,” The Computer J., vol. 39, no. 8, pp. 688-691, 1996.
[8] F.E.H. Tay and L. Shen, “A Modified Chi2 Algorithm for Discretization,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp. 666-670, May/June 2002.
[9] M.J. Beynon, “Stability of Continuous Value Discretisation: An Application within Rough Set Theory,” Int'l J. Approximate Reasoning, vol. 35, pp. 29-53, 2004.
[10] Q.X. Wu, D.A. Bell, and T.M. McGinnity, “Multi-Knowledge for Decision Making,” Int'l J. Knowledge and Information Systems, vol. 7, no. 2, pp. 246-266, 2005.
[11] Rough Set Methods and Applications, New Developments in Knowledge Discovery in Information Systems, L. Polkowski, S. Tsumoto and T.Y. Lin, eds. Physica-Verlag, a Springer-Verlag Company, 2000.
[12] Rough Set and Data Mining, T.Y. Lin and N. Cercone, eds. Kluwer Academic Publishers, 1997.
[13] J.R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[14] M.T. Mitchell, Machine Learning. McGraw Hill, copublished by the MIT Press Companies, Inc., 1997.
[15] Q.X. Wu and D.A. Bell, “Multi-Knowledge Extraction and Application,” Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, G.Y. Wang, Q. Liu, Y.Y. Yao and A. Skowron, eds., LNAI 2639, pp. 274-279, Springer, 2003.

Index Terms:
Data mining, machine learning, information theory, decision support.
QingXiang Wu, David A. Bell, Girijesh Prasad, Thomas Martin McGinnity, "A Distribution-Index-Based Discretizer for Decision-Making with Symbolic AI Approaches," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 1, pp. 17-28, Jan. 2007, doi:10.1109/TKDE.2007.2
Usage of this product signifies your acceptance of the Terms of Use.