This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Extended Chi2 Algorithm for Discretization of Real Value Attributes
March 2005 (vol. 17 no. 3)
pp. 437-441
The Variable Precision Rough Sets (VPRS) model is a powerful tool for data mining, as it has been widely applied to acquire knowledge. Despite its diverse applications in many domains, the VPRS model unfortunately cannot be applied to real-world classification tasks involving continuous attributes. This requires a discretization method to preprocess the data. Discretization is an effective technique to deal with continuous attributes for data mining, especially for the classification problem. The modified Chi2 algorithm is one of the modifications to the Chi2 algorithm, replacing the inconsistency check in the Chi2 algorithm by using the quality of approximation, coined from the Rough Sets Theory (RST), in which it takes into account the effect of degrees of freedom. However, the classification with a controlled degree of uncertainty, or a misclassification error, is outside the realm of RST. This algorithm also ignores the effect of variance in the two merged intervals. In this study, we propose a new algorithm, named the extended Chi2 algorithm, to overcome these two drawbacks. By running the software of See5, our proposed algorithm possesses a better performance than the original and modified Chi2 algorithms.

[1] A. An, N. Shan, C. Chan, N. Cercone, and W. Ziarko, “Discovering Rules for Water Demand Prediction: An Enhanced Rough-Set Approach,” Eng. Applications in Artificial Intelligence, vol. 9, no. 6, pp. 645-653, 1996.
[2] M. Beynon, “Reducts within the Variable Precision Rough Sets Model: A Further Investigation,” European J. Operational Research, vol. 134, pp. 592-605, 2001.
[3] M. Beynon, “The Identification of Low-Paying Workplaces: An Analysis Using the Variable Precision Rough Sets Model,” Proc. Third Int'l Conf. Rough Sets and Current Trend in Computing, pp. 530-537, 2002.
[4] R. Chmielewski and W. Grzymala-Busse, “Global Descretization of Continuous Attributes as Preprocessing for Machine Learning,” Int'l J. Approximate Reasoning, vol. 15, no. 4, pp. 319-331, 1996.
[5] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and Unsupervised Discretization of Continuous Features,” Machine Learning: Proc. 12th Int'l Conf., pp. 194-202, 1995.
[6] R.C. Holte, “Very Simple Classification Rules Perform Well on Most Commonly Used Datasets,” Machine Learning, vol. 11, no. 1, pp. 63-91, 1993.
[7] M.W. Kattan and R.B. Cooper, “The Predictive Accuracy of Computer-Based Classification Decision Techniques. A Review and Research Directions,” Omega-Int'l J. Management Science, vol. 26, no. 4, pp. 467-482, 1998.
[8] R. Kerber, “ChiMerge: Discretization of Numeric Attributes,” Proc. 10th Int'l Artificial Intelligence, pp. 123-128, 1992.
[9] R.P. Li and Z.O. Wang, “An Entropy-Based Discretization Method for Classification Rules with Inconsistency Checking,” Proc. First Conf. Machine Learning and Cybemetics, pp. 243-246, 2002.
[10] H. Liu and R. Setiono, “Feature Selection via Discretization,” IEEE Trans. Knowledge and Data Eng., vol. 9, no. 4, pp. 642-645, July/Aug. 1997.
[11] D.C. Montgomery and G.C. Runger, Applied Statistics and Probability for Engineers. John Wiley & Sons, 1999.
[12] H.S. Nguyen, “Discretization of Real Value Attributes: A Boolean Reasoning Approach,” PhD thesis, Warsaw Univ. 1997.
[13] H.S. Nguyen and A. Skowron, “Quantization of Real Value Attributes: Rough Set and Boolean Reasoning Approach,” Bull. Int'l Rough Set Soc., vol. 1, no. 1, pp. 5-16, 1997.
[14] H.S. Nguyen, “Discretization Problem for Rough Sets Methos,” Proc. First Int'l Conf. Rough Sets and Current Trend in Computing, pp. 545-552, 1998.
[15] H.S. Nguyen and S.H. Nguyen, “Discretization Methods in Data Mining,” Rough Sets in Knowledge Discovery, Heidelberg: Physica-Verlag, pp. 451-482, 1998.
[16] L. Shen and E.H. Tay, “A Discretization Method for Rough Sets Theory,” Intelligent Data Analysis, vol. 5, pp. 431-438, 2001.
[17] E.H. Tay and L. Shen, “A Modified Chi2 Algorithm for Discretization,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp. 666-670, May/June 2002.
[18] W. Ziarko, “Variable Precision Rough Set Model,” J. Computer and System Science, vol. 46, pp. 39-59, 1993.
[19] W. Ziarko, “VPRSM Approach to WEB Searching,” Proc. Third Int'l Conf. Rough Sets and Current Trend in Computing, pp. 514-521, 2002.

Index Terms:
VPRS model, RST, data mining, discretization.
Citation:
Chao-Ton Su, Jyh-Hwa Hsu, "An Extended Chi2 Algorithm for Discretization of Real Value Attributes," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, pp. 437-441, March 2005, doi:10.1109/TKDE.2005.39
Usage of this product signifies your acceptance of the Terms of Use.