Subscribe

Issue No.09 - September (2008 vol.20)

pp: 1230-1238

Cecilio Angulo , UPC - ESAII, Vilanova i la Geltrú

Francisco J. Ruiz , UPC - ESAII, Vilanova i la Geltrú

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.66

ABSTRACT

This article introduces a new method for supervised discretization based on interval distances by using a novel concept of neighbourhood in the target's space. The method proposed takes into consideration the order of the class attribute, when this exists, so that it can be used with ordinal discrete classes as well as continuous classes, in the case of regression problems. The method has proved to be very efficient in terms of accuracy and faster than the most commonly supervised discretization methods used in the literature. It is illustrated through several examples and a comparison with other standard discretization methods is performed for three public data sets by using two different learning tasks: a decision tree algorithm and SVM for regression.

INDEX TERMS

Interval arithmetic, Clustering, classification, and association rules, Mining methods and algorithms

CITATION

Cecilio Angulo, Francisco J. Ruiz, "IDD: A Supervised Interval Distance-Based Method for Discretization",

*IEEE Transactions on Knowledge & Data Engineering*, vol.20, no. 9, pp. 1230-1238, September 2008, doi:10.1109/TKDE.2008.66REFERENCES

- [1] J. Catlett, “On Changing Continuous Attributes into Ordered Discrete Attributes,”
Proc. European Working Session Learning (EWSL '91), Y. Kodratoff ed., pp. 164-178, 1991.- [3] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and Unsupervised Discretization of Continuous Features,”
Proc. 12th Int'l Conf. Machine Learning (ICML '05), pp. 194-202, 1995.- [4] U.M. Fayyad and K.B. Irani, “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning,”
Proc. 13th Int'l Joint Conf. Artificial Intelligence (IJCAI '93), 1993.- [5] L. González, F. Velasco, J.A. Ortega, C. Angulo, and F.J. Ruiz, “Sobre Núcleos, Distancias y Similitudes Entre Intervalos,”
Inteligencia Artificial. RIIA, no. 23, pp. 111-117, 2004.- [6] K.M. Ho and P.D. Scott, “Zeta: A Global Method for Discretization of Continuous Variable,”
Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD '97), pp. 191-194, 1997.- [7] R.C. Holte, “Very Simple Classification Rules Perform Well on Most Commonly Used Dataset,”
Machine Learning, vol. 11, pp. 63-91, 1993.- [8] R. Kerber, “Chi-Merge: Discretization of Numeric Attributes,”
Proc. 10th Nat'l Conf. Artificial Intelligence (AAAI '92), pp. 123-128, 1992.- [10] H. Liu, F. Hussain, C. Lim, and M. Dash, “Discretization: An Enabling Technique,”
Data Mining and Knowledge Discovery, vol. 6, no. 4, pp. 393-423, 2002.- [11] H. Liu and R. Setiono, “Chi2: Feature Selection and Discretization of Numeric Attributes,”
Proc. Seventh IEEE Int'l Conf. Tools with Artificial Intelligence (ICTAI), 1995.- [12] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Science, Univ. of California, http://www.ics. uci.edu/~mlearnMLRepository.html , 1998.
- [13] R Development Core Team, “R: A Language and Environment for Statistical Computing,” R Foundation for Statistical Computing, 2006.
- [14] C.E. Rasmussen et al.,
Delve Databases, Univ. of Toronto, http://www.cs.toronto.edu~delve, 2008.- [15] M. Richeldi and M. Rossotto, “Class-Driven Statistical Discretization of Continuous Attributes,”
Proc. Eighth European Conf. Machine Learning (ECML '95), pp. 335-338, 1995.- [17] A. Smola and B. Sch, “A Tutorial on Support Vector Regression,” Technical Report NeuroCOLT NC.TR-98-030, Royal Holloway College, Univ. of London, 1998.
- [18] T. Therneau and E. Atkinson, “An Introduction to Recursive Partitioning Using the RPART Routine,” technical report, Section of Biostatistics, Mayo Clinic, Rochester, http://www.mayo.edu/hsr/techrpt61.pdf, 2008.
- [19] L. Travé-Massuyès,
Le Raisonnement Qualitatif pour les Sciences de l'Ingénieur. Hermès, 1997. |