This Article 
 Bibliographic References 
 Add to: 
IDD: A Supervised Interval Distance-Based Method for Discretization
September 2008 (vol. 20 no. 9)
pp. 1230-1238
Francisco J. Ruiz, UPC - ESAII, Vilanova i la Geltrú
Cecilio Angulo, UPC - ESAII, Vilanova i la Geltrú
Núria Agell, ESADE, Barcelona
This article introduces a new method for supervised discretization based on interval distances by using a novel concept of neighbourhood in the target's space. The method proposed takes into consideration the order of the class attribute, when this exists, so that it can be used with ordinal discrete classes as well as continuous classes, in the case of regression problems. The method has proved to be very efficient in terms of accuracy and faster than the most commonly supervised discretization methods used in the literature. It is illustrated through several examples and a comparison with other standard discretization methods is performed for three public data sets by using two different learning tasks: a decision tree algorithm and SVM for regression.

[1] J. Catlett, “On Changing Continuous Attributes into Ordered Discrete Attributes,” Proc. European Working Session Learning (EWSL '91), Y. Kodratoff ed., pp. 164-178, 1991.
[2] J.Y. Ching, A.K.C. Wong, and K.C.C. Chan, “Class-Dependent Discretization for Inductive Learning from Continuous and Mixed Mode Data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 7, pp. 641-651, July 1995.
[3] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and Unsupervised Discretization of Continuous Features,” Proc. 12th Int'l Conf. Machine Learning (ICML '05), pp. 194-202, 1995.
[4] U.M. Fayyad and K.B. Irani, “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning,” Proc. 13th Int'l Joint Conf. Artificial Intelligence (IJCAI '93), 1993.
[5] L. González, F. Velasco, J.A. Ortega, C. Angulo, and F.J. Ruiz, “Sobre Núcleos, Distancias y Similitudes Entre Intervalos,” Inteligencia Artificial. RIIA, no. 23, pp. 111-117, 2004.
[6] K.M. Ho and P.D. Scott, “Zeta: A Global Method for Discretization of Continuous Variable,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD '97), pp. 191-194, 1997.
[7] R.C. Holte, “Very Simple Classification Rules Perform Well on Most Commonly Used Dataset,” Machine Learning, vol. 11, pp. 63-91, 1993.
[8] R. Kerber, “Chi-Merge: Discretization of Numeric Attributes,” Proc. 10th Nat'l Conf. Artificial Intelligence (AAAI '92), pp. 123-128, 1992.
[9] L.A. Kurgan and K.J. Cios, “CAIM Discretization Algorithm,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 2, pp. 145-153, Feb. 2004.
[10] H. Liu, F. Hussain, C. Lim, and M. Dash, “Discretization: An Enabling Technique,” Data Mining and Knowledge Discovery, vol. 6, no. 4, pp. 393-423, 2002.
[11] H. Liu and R. Setiono, “Chi2: Feature Selection and Discretization of Numeric Attributes,” Proc. Seventh IEEE Int'l Conf. Tools with Artificial Intelligence (ICTAI), 1995.
[12] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Science, Univ. of California, http://www.ics. , 1998.
[13] R Development Core Team, “R: A Language and Environment for Statistical Computing,” R Foundation for Statistical Computing, 2006.
[14] C.E. Rasmussen et al., Delve Databases, Univ. of Toronto,, 2008.
[15] M. Richeldi and M. Rossotto, “Class-Driven Statistical Discretization of Continuous Attributes,” Proc. Eighth European Conf. Machine Learning (ECML '95), pp. 335-338, 1995.
[16] J. Rissanen, “Modeling by Shortest Data Description,” Automatica, vol. 14, pp. 445-471, 1978.
[17] A. Smola and B. Sch, “A Tutorial on Support Vector Regression,” Technical Report NeuroCOLT NC.TR-98-030, Royal Holloway College, Univ. of London, 1998.
[18] T. Therneau and E. Atkinson, “An Introduction to Recursive Partitioning Using the RPART Routine,” technical report, Section of Biostatistics, Mayo Clinic, Rochester,, 2008.
[19] L. Travé-Massuyès, Le Raisonnement Qualitatif pour les Sciences de l'Ingénieur. Hermès, 1997.

Index Terms:
Interval arithmetic, Clustering, classification, and association rules, Mining methods and algorithms
Francisco J. Ruiz, Cecilio Angulo, Núria Agell, "IDD: A Supervised Interval Distance-Based Method for Discretization," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 9, pp. 1230-1238, Sept. 2008, doi:10.1109/TKDE.2008.66
Usage of this product signifies your acceptance of the Terms of Use.