This Article 
 Bibliographic References 
 Add to: 
A Modified Chi2 Algorithm for Discretization
May/June 2002 (vol. 14 no. 3)
pp. 666-670

Abstract—Since the ChiMerge algorithm was first proposed by Kerber in 1992, it has become a widely used and discussed discretization method. The Chi2 algorithm is a modification to the ChiMerge method. It automates the discretization process by introducing an inconsistency rate as the stopping criterion and it automatically selects the significance value. In addition, it adds a finer phase aimed at feature selection to broaden the applications of the ChiMerge algorithm. However, the Chi2 algorithm does not consider the inaccuracy inherent in ChiMerge's merging criterion. The user-defined inconsistency rate also brings about inaccuracy to the discretization process. These two drawbacks are first discussed in this paper and modifications to overcome them are then proposed. By comparison, results with original Chi2 algorithm using C4.5, the modified Chi2 algorithm, performs better than the original Chi2 algorithm. It becomes a completely automatic discretization method.

[1] R. Kohavi, “Bottom-Up Induction of Oblivious Read-Once Decision Graphs: Strengths and Limitation,” Proc. 12th Nat'l Conf. Artificial Intelligence, pp. 613-618, 1994.
[2] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and Unsupervised Discretization of Continuous Features,” Machine Learning: Proc. 12th Int'l Conf., pp.194-202, 1995.
[3] M. Chmielewski and J. Grzrmala-Busse, “Global Discretization of Continuous Attributes as Preprocessing for Machine Learning,” Int'l J. Approximate Reasoning, vol. 15, no. 4, pp. 319-331, Nov. 1996.
[4] U.M. Fayyad and K.B. Irani, “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning,” Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.
[5] R. Kerber, “ChiMerge: Discretization of Numeric Attributes,” Proc. Ninth Int'l Conf. Artificial Intelligence, pp.123-128, 1992.
[6] H. Liu and R. Setiono, "Feature Selection via Discretization of Numeric Attributes," IEEE Trans. Knowledge and Data Eng., vol. 9, no. 4, July/Aug. 1997.
[7] K.M. Risvik, “Discretization of Numerical Attributes, Preprocessing for Machine Learning,” project report, Knowledge Systems Group, Dept. of Computer Systems and Telematics, The Norwegian Inst. of Technology, Univ. of Trondheim, 1997.
[8] Z. Pawlak, “Rough Sets,” Int'l J. Computer and Information Sciences, vol. 11, no. 5, pp.341-356, 1982.
[9] D.C. Montgomery and G.C. Runger, Applied Statistics and Probability for Engineers, second ed. Jones Wiley&Sons, Inc., 1999.
[10] K. Slowinski, “Rough Classification of HSV Patients,” Intelligent Decision Support—Handbook of Applications and Advances of the Rough Sets Theory, chapter 6, pp. 77-94, Kluwer Academic, 1992.
[11] C.J. Merz and P.M. Murphy, UCI Repository of Machine Learning Databases, .
[12] J.M. Quinlan, “Improved Use of Continuous Attributes in C4.5,” J. Artificial Intelligence Research, vol. 4, pp. 77-90, 1996.
[13] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[14] S. Weiss and C. Kulikowski, Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, Morgan Kaufmann, 1991.
[15] R. Kohavi and M. Sahami, “Error-Based and Entropy-Based Discretization of Continuous Features,” Proc. Second Int'l Conf. Knowledge Discovery&Data Mining, pp. 114-119, 1996.

Index Terms:
Discretization, degree of freedom, \chi^2 test
F.E.H. Tay, L. Shen, "A Modified Chi2 Algorithm for Discretization," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, pp. 666-670, May-June 2002, doi:10.1109/TKDE.2002.1000349
Usage of this product signifies your acceptance of the Terms of Use.