Issue No.04 - April (2013 vol.25)

pp: 734-750

Salvador Garcia , Dept. of Comput. Sci., Univ. of Jaen, Jaen, Spain

J. Luengo , Dept. of Civil Eng., Univ. of Burgos, Burgos, Spain

José Antonio Sáez , Dept. of Comput. Sci. & Artificial Intell., Univ. of Granada, Granada, Spain

Victoria López , Dept. of Comput. Sci. & Artificial Intell., Univ. of Granada, Granada, Spain

F. Herrera , Dept. of Comput. Sci. & Artificial Intell., Univ. of Granada, Granada, Spain

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2012.35

ABSTRACT

Discretization is an essential preprocessing technique used in many knowledge discovery and data mining tasks. Its main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data. In this manner, symbolic data mining algorithms can be applied over continuous data and the representation of information is simplified, making it more concise and specific. The literature provides numerous proposals of discretization and some attempts to categorize them into a taxonomy can be found. However, in previous papers, there is a lack of consensus in the definition of the properties and no formal categorization has been established yet, which may be confusing for practitioners. Furthermore, only a small set of discretizers have been widely considered, while many other methods have gone unnoticed. With the intention of alleviating these problems, this paper provides a survey of discretization methods proposed in the literature from a theoretical and empirical perspective. From the theoretical perspective, we develop a taxonomy based on the main properties pointed out in previous research, unifying the notation and including all the known methods up to date. Empirically, we conduct an experimental study in supervised classification involving the most representative and newest discretizers, different types of classifiers, and a large number of data sets. The results of their performances measured in terms of accuracy, number of intervals, and inconsistency have been verified by means of nonparametric statistical tests. Additionally, a set of discretizers are highlighted as the best performing ones.

INDEX TERMS

Taxonomy, Delta modulation, Heuristic algorithms, Merging, Algorithm design and analysis, Supervised learning, Electronic mail, classification, Discretization, continuous attributes, decision trees, taxonomy, data preprocessing, data mining

CITATION

Salvador Garcia, J. Luengo, José Antonio Sáez, Victoria López, F. Herrera, "A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning",

*IEEE Transactions on Knowledge & Data Engineering*, vol.25, no. 4, pp. 734-750, April 2013, doi:10.1109/TKDE.2012.35REFERENCES

- [1] J. Han, M. Kamber, and J. Pei,
Data Mining: Concepts and Techniques, The Morgan Kaufmann Series in Data Management Systems, second ed. Morgan Kaufmann, 2006.- [2] I.H. Witten, E. Frank, and M.A. Hall,
Data Mining: Practical Machine Learning Tools and Techniques, third ed. Morgan Kaufmann, 2011.- [3] I. Kononenko and M. Kukar,
Machine Learning and Data Mining: Introduction to Principles and Algorithms. Horwood Publishing Limited, 2007.- [4] K.J. Cios, W. Pedrycz, R.W. Swiniarski, and L.A. Kurgan,
Data Mining: A Knowledge Discovery Approach. Springer, 2007.- [5] D. Pyle,
Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., 1999.- [6] H. Liu, F. Hussain, C.L. Tan, and M. Dash, "Discretization: An Enabling Technique,"
Data Mining and Knowledge Discovery, vol. 6, no. 4, pp. 393-423, 2002.- [7] J. Dougherty, R. Kohavi, and M. Sahami, "Supervised and Unsupervised Discretization of Continuous Features,"
Proc. 12th Int'l Conf. Machine Learning (ICML), pp. 194-202, 1995.- [8] Y. Yang, G.I. Webb, and X. Wu, "Discretization Methods,"
Data Mining and Knowledge Discovery Handbook, pp. 101-116, Springer, 2010.- [9]
The Top Ten Algorithms in Data Mining, Chapman & Hall/CRC Data Mining and Knowledge Discovery, X. Wu and V. Kumar. CRC Press. 2009.- [10] J.R. Quinlan,
C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., 1993.- [11] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules,"
Proc. 20th Very Large Data Bases Conf. (VLDB), pp. 487-499, 1994.- [12] Y. Yang and G.I. Webb, "Discretization for Naive-Bayes Learning: Managing Discretization Bias and Variance,"
Machine Learning, vol. 74, no. 1, pp. 39-74, 2009.- [13] M.J. Flores, J.A. Gámez, A.M. Martínez, and J.M. Puerta, "Handling Numeric Attributes when Comparing Bayesian Network Classifiers: Does the Discretization Method Matter?"
Applied Intelligence, vol. 34, pp. 372-385, 2011, doi: 10.1007/s10489-011-0286-z. - [14] M. Richeldi and M. Rossotto, "Class-Driven Statistical Discretization of Continuous Attributes,"
Proc. Eighth European Conf. Machine Learning (ECML '95), pp. 335-338, 1995.- [15] B. Chlebus and S.H. Nguyen, "On Finding Optimal Discretizations for Two Attributes,"
Proc. First Int'l Conf. Rough Sets and Current Trends in Computing (RSCTC '98), pp. 537-544, 1998.- [16] U.M. Fayyad and K.B. Irani, "Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning,"
Proc. 13th Int'l Joint Conf. Artificial Intelligence (IJCAI), pp. 1022-1029, 1993.- [17] R. Kerber, "ChiMerge: Discretization of Numeric Attributes,"
Proc. Nat'l Conf. Artifical Intelligence Am. Assoc. for Artificial Intelligence (AAAI), pp. 123-128, 1992.- [18] H. Liu and R. Setiono, "Feature Selection via Discretization,"
IEEE Trans. Knowledge and Data Eng., vol. 9, no. 4, pp. 642-645, July/Aug. 1997.- [19] X. Wu, "A Bayesian Discretizer for Real-Valued Attributes,"
The Computer J., vol. 39, pp. 688-691, 1996.- [20] M. Boullé, "MODL: A Bayes Optimal Discretization Method for Continuous Attributes,"
Machine Learning, vol. 65, no. 1, pp. 131-165, 2006.- [21] S.H. Nguyen and A. Skowron, "Quantization of Real Value Attributes - Rough Set and Boolean Reasoning Approach,"
Proc. Second Joint Ann. Conf. Information Sciences (JCIS), pp. 34-37, 1995.- [22] G. Zhang, L. Hu, and W. Jin, "Discretization of Continuous Attributes in Rough Set Theory and Its Application,"
Proc. IEEE Conf. Cybernetics and Intelligent Systems (CIS), pp. 1020-1026, 2004.- [23] A.A. Bakar, Z.A. Othman, and N.L.M. Shuib, "Building a New Taxonomy for Data Discretization Techniques,"
Proc. Conf. Data Mining and Optimization (DMO), pp. 132-140, 2009.- [24] M.R. Chmielewski and J.W. Grzymala-Busse, "Global Discretization of Continuous Attributes as Preprocessing for Machine Learning,"
Int'l J. Approximate Reasoning, vol. 15, no. 4, pp. 319-331, 1996.- [25] G.K. Singh and S. Minz, "Discretization Using Clustering and Rough Set Theory,"
Proc. 17th Int'l Conf. Computer Theory and Applications (ICCTA), pp. 330-336, 2007.- [26] L. Liu, A.K.C. Wong, and Y. Wang, "A Global Optimal Algorithm for Class-Dependent Discretization of Continuous Data,"
Intelligent Data Analysis, vol. 8, pp. 151-170, 2004.- [27] R.C. Holte, "Very Simple Classification Rules Perform Well on Most Commonly Used Datasets,"
Machine Learning, vol. 11, pp. 63-90, 1993.- [28] J. Catlett, "On Changing Continuous Attributes Into Ordered Discrete Attributes,"
Proc. European Working Session on Learning (EWSL), pp. 164-178, 1991.- [29] R. Susmaga, "Analyzing Discretizations of Continuous Attributes Given a Monotonic Discrimination Function,"
Intelligent Data Analysis, vol. 1, nos. 1-4, pp. 157-179, 1997.- [30] T. Elomaa and J. Rousu, "General and Efficient Multisplitting of Numerical Attributes,"
Machine Learning, vol. 36, pp. 201-244, 1999.- [31] T. Elomaa and J. Rousu, "Necessary and Sufficient Pre-Processing in Numerical Range Discretization,"
Knowledge and Information Systems, vol. 5, pp. 162-182, 2003.- [32] T. Elomaa and J. Rousu, "Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates,"
Data Mining and Knowledge Discovery, vol. 8, pp. 97-126, 2004.- [33] L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen,
Classification and Regression Trees. Chapman and Hall/CRC, 1984.- [34] S.R. Gaddam, V.V. Phoha, and K.S. Balagani, "K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods,"
IEEE Trans. Knowledge and Data Eng., vol. 19, no. 3, pp. 345-354, Mar. 2007.- [35] H.-W. Hu, Y.-L. Chen, and K. Tang, "A Dynamic Discretization Approach for Constructing Decision Trees with a Continuous Label,"
IEEE Trans. Knowledge and Data Eng., vol. 21, no. 11, pp. 1505-1514, Nov. 2009.- [36] H. Ishibuchi, T. Yamamoto, and T. Nakashima, "Fuzzy Data Mining: Effect of Fuzzy Discretization,"
Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 241-248, 2001.- [37] A. Roy and S.K. Pal, "Fuzzy Discretization of Feature Space for a Rough Set Classifier,"
Pattern Recognition Letters, vol. 24, pp. 895-902, 2003.- [38] D. Janssens, T. Brijs, K. Vanhoof, and G. Wets, "Evaluating the Performance of Cost-Based Discretization Versus Entropy- and Error-Based Discretization,"
Computers & Operations Research, vol. 33, no. 11, pp. 3107-3123, 2006.- [39] H. He and E.A. Garcia, "Learning from Imbalanced Data,"
IEEE Trans. Knowledge and Data Eng., vol. 21, no. 9, pp. 1263-1284, Sept. 2009.- [40] Y. Sun, A.K.C. Wong, and M.S. Kamel, "Classification of Imbalanced Data: A Review,"
Int'l J. Pattern Recognition and Artificial Intelligence, vol. 23, no. 4, pp. 687-719, 2009.- [41] A. Bondu, M. Boulle, and V. Lemaire, "A Non-Parametric Semi-Supervised Discretization Method,"
Knowledge and Information Systems, vol. 24, pp. 35-57, 2010.- [42] F. Berzal, J.-C. Cubero, N. Marín, and D. Sánchez, "Building Multi-Way Decision Trees with Numerical Attributes,"
Information Sciences, vol. 165, pp. 73-90, 2004.- [43] W.-H. Au, K.C.C. Chan, and A.K.C. Wong, "A Fuzzy Approach to Partitioning Continuous Attributes for Classification,"
IEEE Trans. Knowledge Data Eng., vol. 18, no. 5, pp. 715-719, May 2006.- [44] S. Mehta, S. Parthasarathy, and H. Yang, "Toward Unsupervised Correlation Preserving Discretization,"
IEEE Trans. Knowledge and Data Eng., vol. 17, no. 9, pp. 1174-1185, Sept. 2005.- [45] S.D. Bay, "Multivariate Discretization for Set Mining,"
Knowledge Information Systems, vol. 3, pp. 491-512, 2001.- [46] M.N.M. García, J.P. Lucas, V.F.L. Batista, and M.J.P. Martín, "Multivariate Discretization for Associative Classification in a Sparse Data Application Domain,"
Proc. Fifth Int'l Conf. Hybrid Artificial Intelligent Systems (HAIS), pp. 104-111, 2010.- [47] S. Ferrandiz and M. Boullé, "Multivariate Discretization by Recursive Supervised Bipartition of Graph,"
Proc. Fourth Conf. Machine Learning and Data Mining (MLDM), pp. 253-264, 2005.- [48] P. Yang, J.-S. Li, and Y.-X. Huang, "HDD: A Hypercube Division-Based Algorithm for Discretisation,"
Int'l J. Systems Science, vol. 42, no. 4, pp. 557-566, 2011.- [49] R.-P. Li and Z.-O. Wang, "An Entropy-Based Discretization Method for Classification Rules with Inconsistency Checking,"
Proc. First Int'l Conf. Machine Learning and Cybernetics (ICMLC), pp. 243-246, 2002.- [50] C.-H. Lee, "A Hellinger-Based Discretization Method for Numeric Attributes in Classification Learning,"
Knowledge-Based Systems, vol. 20, pp. 419-425, 2007.- [51] F.J. Ruiz, C. Angulo, and N. Agell, "IDD: A Supervised Interval Distance-Based Method for Discretization,"
IEEE Trans. Knowledge and Data Eng., vol. 20, no. 9, pp. 1230-1238, Sept. 2008.- [52] J.Y. Ching, A.K.C. Wong, and K.C.C. Chan, "Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 7, pp. 641-651, July 1995.- [53] J.L. Flores, I. Inza, and Larra, "Wrapper Discretization by Means of Estimation of Distribution Algorithms,"
Intelligent Data Analysis, vol. 11, no. 5, pp. 525-545, 2007.- [54] D.A. Zighed, S. Rabaséda, and R. Rakotomalala, "FUSINTER: A Method for Discretization of Continuous Attributes,"
Int'l J. Uncertainty, Fuzziness Knowledge-Based Systems, vol. 6, pp. 307-326, 1998.- [55] R. Jin, Y. Breitbart, and C. Muoh, "Data Discretization Unification,"
Knowledge and Information Systems, vol. 19, pp. 1-29, 2009.- [56] K.M. Ho and P.D. Scott, "Zeta: A Global Method for Discretization of Continuous Variables,"
Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 191-194, 1997.- [57] L.A. Kurgan and K.J. Cios, "CAIM Discretization Algorithm,"
IEEE Trans. Knowledge and Data Eng., vol. 16, no. 2, pp. 145-153, Feb. 2004.- [58] C.-J. Tsai, C.-I. Lee, and W.-P. Yang, "A Discretization Algorithm Based on Class-Attribute Contingency Coefficient,"
Information Sciences, vol. 178, pp. 714-731, 2008.- [59] D. Ventura and T.R. Martinez, "BRACE: A Paradigm for the Discretization of Continuously Valued Data,"
Proc. Seventh Ann. Florida AI Research Symp. (FLAIRS), pp. 117-121, 1994.- [60] M.J. Pazzani, "An Iterative Improvement Approach for the Discretization of Numeric Attributes in Bayesian Classifiers,"
Proc. First Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 228-233, 1995.- [61] A.K.C. Wong and D.K.Y. Chiu, "Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-9, no. 6, pp. 796-805, Nov. 1987.- [62] M. Vannucci and V. Colla, "Meaningful Discretization of Continuous Features for Association Rules Mining by Means of a SOM,"
Prooc. 12th European Symp. Artificial Neural Networks (ESANN), pp. 489-494, 2004.- [63] P.A. Chou, "Optimal Partitioning for Classification and Regression Trees,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp. 340-354, Apr. 1991.- [64] R. Butterworth, D.A. Simovici, G.S. Santos, and L. Ohno-Machado, "A Greedy Algorithm for Supervised Discretization,"
J. Biomedical Informatics, vol. 37, pp. 285-292, 2004.- [65] C. Chan, C. Batur, and A. Srinivasan, "Determination of Quantization Intervals in Rule Based Model for Dynamic Systems,"
Proc. Conf. Systems and Man and Cybernetics, pp. 1719-1723, 1991.- [66] M. Boulle, "Khiops: A Statistical Discretization Method of Continuous Attributes,"
Machine Learning, vol. 55, pp. 53-69, 2004.- [67] C.-T. Su and J.-H. Hsu, "An Extended Chi2 Algorithm for Discretization of Real Value Attributes,"
IEEE Trans. Knowledge and Data Eng., vol. 17, no. 3, pp. 437-441, Mar. 2005.- [68] X. Liu and H. Wang, "A Discretization Algorithm Based on a Heterogeneity Criterion,"
IEEE Trans. Knowledge and Data Eng., vol. 17, no. 9, pp. 1166-1173, Sept. 2005.- [69] D. Ventura and T.R. Martinez, "An Empirical Comparison of Discretization Methods,"
Proc. 10th Int'l Symp. Computer and Information Sciences (ISCIS), pp. 443-450, 1995.- [70] M. Wu, X.-C. Huang, X. Luo, and P.-L. Yan, "Discretization Algorithm Based on Difference-Similitude Set Theory,"
Proc. Fourth Int'l Conf. Machine Learning and Cybernetics (ICMLC), pp. 1752-1755, 2005.- [71] I. Kononenko and M.R. Sikonja, "Discretization of Continuous Attributes Using Relieff,"
Proc. Elektrotehnika in Racunalnika Konferenca (ERK), 1995.- [72] S. Chao and Y. Li, "Multivariate Interdependent Discretization for Continuous Attribute,"
Proc. Third Int'l Conf. Information Technology and Applications (ICITA), vol. 2, pp. 167-172, 2005.- [73] Q. Wu, J. Cai, G. Prasad, T.M. McGinnity, D.A. Bell, and J. Guan, "A Novel Discretizer for Knowledge Discovery Approaches Based on Rough Sets,"
Proc. First Int'l Conf. Rough Sets and Knowledge Technology (RSKT), pp. 241-246, 2006.- [74] B. Pfahringer, "Compression-Based Discretization of Continuous Attributes,"
Proc. 12th Int'l Conf. Machine Learning (ICML), pp. 456-463, 1995.- [75] Y. Kang, S. Wang, X. Liu, H. Lai, H. Wang, and B. Miao, "An ICA-Based Multivariate Discretization Algorithm,"
Proc. First Int'l Conf. Knowledge Science, Eng. and Management (KSEM), pp. 556-562, 2006.- [76] T. Elomaa, J. Kujala, and J. Rousu, "Practical Approximation of Optimal Multivariate Discretization,"
Proc. 16th Int'l Symp. Methodologies for Intelligent Systems (ISMIS), pp. 612-621, 2006.- [77] N. Friedman and M. Goldszmidt, "Discretizing Continuous Attributes while Learning Bayesian Networks,"
Proc. 13th Int'l Conf. Machine Learning (ICML), pp. 157-165, 1996.- [78] Q. Wu, D.A. Bell, G. Prasad, and T.M. McGinnity, "A Distribution-Index-Based Discretizer for Decision-Making with Symbolic AI Approaches,"
IEEE Trans. Knowledge and Data Eng., vol. 19, no. 1, pp. 17-28, Jan. 2007.- [79] J. Cerquides and R.L.D. Mantaras, "Proposal and Empirical Comparison of a Parallelizable Distance-Based Discretization Method,"
Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 139-142, 1997.- [80] R. Subramonian, R. Venkata, and J. Chen, "A Visual Interactive Framework for Attribute Discretization,"
Proc. First Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 82-88, 1997.- [81] D.A. Zighed, R. Rakotomalala, and F. Feschet, "Optimal Multiple Intervals Discretization of Continuous Attributes for Supervised Learning,"
Proc. First Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 295-298, 1997.- [82] W. Qu, D. Yan, Y. Sang, H. Liang, M. Kitsuregawa, and K. Li, "A Novel Chi2 Algorithm for Discretization of Continuous Attributes,"
Proc. 10th Asia-Pacific web Conf. Progress in WWW Research and Development (APWeb), pp. 560-571, 2008.- [83] S.J. Hong, "Use of Contextual Information for Feature Ranking and Discretization,"
IEEE Trans. Knowledge and Data Eng., vol. 9, no. 5, pp. 718-730, Sept./Oct. 1997.- [84] L. González-Abril, F.J. Cuberos, F. Velasco, and J.A. Ortega, "Ameva: An Autonomous Discretization Algorithm,"
Expert Systems with Applications, vol. 36, pp. 5327-5332, 2009.- [85] K. Wang and B. Liu, "Concurrent Discretization of Multiple Attributes,"
Proc. Pacific Rim Int'l Conf. Artificial Intelligence (PRICAI), pp. 250-259, 1998.- [86] P. Berka and I. Bruha, "Empirical Comparison of Various Discretization Procedures,"
Int'l J. Pattern Recognition and Artificial Intelligence, vol. 12, no. 7, pp. 1017-1032, 1998.- [87] J.W. Grzymala-Busse, "A Multiple Scanning Strategy for Entropy Based Discretization,"
Proc. 18th Int'l Symp. Foundations of Intelligent Systems (ISMIS,) pp. 25-34, 2009.- [88] P. Perner and S. Trautzsch, "Multi-Interval Discretization Methods for Decision Tree Learning,"
SSPR '98/SPR '98: Proc. Joint IAPR Int'l Workshops Advances in Pattern Recognition, pp. 475-482, 1998.- [89] S. Wang, F. Min, Z. Wang, and T. Cao, "OFFD: Optimal Flexible Frequency Discretization for Naive Bayes Classification,"
Proc. Fifth Int'l Conf.Advanced Data Mining and Applications (ADMA), pp. 704-712, 2009.- [90] S. Monti and G.F. Cooper, "A Multivariate Discretization Method for Learning Bayesian Networks from Mixed Data,"
Proc. 14th Conf. Ann. Conf. Uncertainty in Artificial Intelligence (UAI), pp. 404-413, 1998.- [91] J. Gama, L. Torgo, and C. Soares, "Dynamic Discretization of Continuous Attributes,"
Proc. Sixth Ibero-Am. Conf. AI: Progress in Artificial Intelligence (IBERAMIA), pp. 160-169, 1998.- [92] P. Pongaksorn, T. Rakthanmanon, and K. Waiyamai, "DCR: Discretization Using Class Information to Reduce Number of Intervals,"
Proc. Int'l Conf. Quality Issues, Measures of Interestingness and Evaluation of Data Mining Model (QIMIE), pp. 17-28, 2009.- [93] S. Monti and G. Cooper, "A Latent Variable Model for Multivariate Discretization,"
Proc. Seventh Int'l Workshop AI & Statistics (Uncertainty), 1999.- [94] H. Wei, "A Novel Multivariate Discretization Method for Mining Association Rules,"
Proc. Asia-Pacific Conf. Information Processing (APCIP), pp. 378-381, 2009.- [95] A. An and N. Cercone, "Discretization of Continuous Attributes for Learning Classification Rules,"
Proc. Third Pacific-Asia Conf. Methodologies for Knowledge Discovery and Data Mining (PAKDD '99), pp. 509-514, 1999.- [96] S. Jiang and W. Yu, "A Local Density Approach for Unsupervised Feature Discretization,"
Proc. Fifth Int'l Conf. Advanced Data Mining and Applications (ADMA), pp. 512-519, 2009.- [97] E.J. Clarke and B.A. Barton, "Entropy and MDL Discretization of Continuous Variables for Bayesian Belief Networks,"
Int'l J. Intelligent Systems, vol. 15, pp. 61-92, 2000.- [98] M.-C. Ludl and G. Widmer, "Relative Unsupervised Discretization for Association Rule Mining,"
Proc. Fourth European Conf. Principles of Data Mining and Knowledge Discovery (PKDD), pp. 148-158, 2000.- [99] A. Berrado and G.C. Runger, "Supervised Multivariate Discretization in Mixed Data with Random Forests,"
Proc. ACS/IEEE Int'l Conf. Computer Systems and Applications (ICCSA), pp. 211-217, 2009.- [100] F. Jiang, Z. Zhao, and Y. Ge, "A Supervised and Multivariate Discretization Algorithm for Rough Sets,"
Proc. Fifth Int'l Conf. Rough Set and Knowledge Technology (RSKT), pp. 596-603, 2010.- [101] J.W. Grzymala-Busse and J. Stefanowski, "Three Discretization Methods for Rule Induction,"
Int'l J. Intelligent Systems, vol. 16, no.1, pp. 29-38, 2001.- [102] F.E.H. Tay and L. Shen, "A Modified Chi2 Algorithm for Discretization,"
IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp. 666-670, May/June 2002.- [103] W.-L. Li, R.-H. Yu, and X.-Z. Wang, "Discretization of Continuous-Valued Attributes in Decision Tree Generation,"
Proc. Second Int'l Conf. Machine Learning and Cybernetics (ICMLC), pp. 194-198, 2010.- [104] F. Muhlenbach and R. Rakotomalala, "Multivariate Supervised Discretization, a Neighborhood Graph Approach,"
Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 314-320, 2002.- [105] W. Zhu, J. Wang, Y. Zhang, and L. Jia, "A Discretization Algorithm Based on Information Distance Criterion and Ant Colony Optimization Algorithm for Knowledge Extracting on Industrial Database,"
Proc. IEEE Int'l Conf. Mechatronics and Automation (ICMA), pp. 1477-1482, 2010.- [106] A. Gupta, K.G. Mehrotra, and C. Mohan, "A Clustering-Based Discretization for Supervised Learning,"
Statistics & Probability Letters, vol. 80, nos. 9/10, pp. 816-824, 2010.- [107] R. Giráldez, J. Aguilar-Ruiz, J. Riquelme, F. Ferrer-Troyano, and D. Rodríguez-Baena, "Discretization Oriented to Decision Rules Generation,"
Frontiers in Artificial Intelligence and Applications, vol. 82, pp. 275-279, 2002.- [108] Y. Sang, K. Li, and Y. Shen, "EBDA: An Effective Bottom-Up Discretization Algorithm for Continuous Attributes,"
Proc. IEEE 10th Int'l Conf. Computer and Information Technology (CIT), pp. 2455-2462, 2010.- [109] J.-H. Dai and Y.-X. Li, "Study on Discretization Based on Rough Set Theory,"
Proc. First Int'l Conf. Machine Learning and Cybernetics (ICMLC), pp. 1371-1373, 2002.- [110] L. Nemmiche-Alachaher, "Contextual Approach to Data Discretization,"
Proc. Int'l Multi-Conf. Computing in the Global Information Technology (ICCGI), pp. 35-40, 2010.- [111] C.-W. Chen, Z.-G. Li, S.-Y. Qiao, and S.-P. Wen, "Study on Discretization in Rough Set Based on Genetic Algorithm,"
Proc. Second Int'l Conf. Machine Learning and Cybernetics (ICMLC), pp. 1430-1434, 2003.- [112] J.-H. Dai, "A Genetic Algorithm for Discretization of Decision Systems,"
Proc. Third Int'l Conf. Machine Learning and Cybernetics (ICMLC), pp. 1319-1323, 2004.- [113] S.A. Macskassy, H. Hirsh, A. Banerjee, and A.A. Dayanik, "Using Text Classifiers for Numerical Classification,"
Proc. 17th Int'l Joint Conf. Artificial Intelligence (IJCAI), vol. 2, pp. 885-890, 2001.- [114] J. Alcalá-Fdez, L. Sánchez, S. García, M.J.del Jesus, S. Ventura, J.M. Garrell, J. Otero, C. Romero, J. Bacardit, V.M. Rivas, J.C. Fernández, and F. Herrera, "KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems,"
Soft Computing, vol. 13, no. 3, pp. 307-318, 2009.- [115] J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. García, L. Sánchez, and F. Herrera, "KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework,"
J. Multiple-Valued Logic and Soft Computing, vol. 17, nos. 2/3, pp. 255-287, 2011.- [116] A. Frank and A. Asuncion, "UCI Machine Learning Repository," http://archive.ics.uci.eduml, 2010.
- [117] K.J. Cios, L.A. Kurgan, and S. Dick, "Highly Scalable and Robust Rule Learner: Performance Evaluation and Comparison,"
IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 36, no. 1, pp. 32-53, Feb. 2006.- [118] D.R. Wilson and T.R. Martinez, "Reduction Techniques for Instance-Based Learning Algorithms,"
Machine Learning, vol. 38, no. 3, pp. 257-286, 2000.- [119]
Lazy Learning, D.W. Aha ed. Springer, 2010.- [120] E.K. Garcia, S. Feldman, M.R. Gupta, and S. Srivastava, "Completely Lazy Learning,"
IEEE Trans. Knowledge and Data Eng., vol. 22, no. 9, pp. 1274-1285, Sept. 2010.- [121] R. Rastogi and K. Shim, "Public: A Decision Tree Classifier That Integrates Building and Pruning,"
Data Mining and Knowledge Discovery, vol. 4, pp. 315-344, 2000.- [122] W.W. Cohen, "Fast Effective Rule Induction,"
Proc. 12th Int'l Conf. Machine Learning (ICML), pp. 115-123, 1995.- [123] J.A. Cohen, "Coefficient of Agreement for Nominal Scales,"
Educational and Psychological Measurement, vol. 20, pp. 37-46, 1960.- [124] R.C. Prati, G.E.A.P.A. Batista, and M.C. Monard, "A Survey on Graphical Methods for Classification Predictive Performance Evaluation,"
IEEE Trans. Knowledge and Data Eng., vol. 23, no. 11, pp. 1601-1618, Nov. 2011, doi: 10.1109/TKDE.2011.59. - [125] A. Ben-David, "A Lot of Randomness is Hiding in Accuracy,"
Eng. Applications of Artificial Intelligence, vol. 20, pp. 875-885, 2007.- [126] J. Demšar, "Statistical Comparisons of Classifiers Over Multiple Data Sets,"
J. Machine Learning Research, vol. 7, pp. 1-30, 2006.- [127] S. García and F. Herrera, "An Extension on Statistical Comparisons of Classifiers over Multiple Data Sets for All Pairwise Comparisons,"
J. Machine Learning Research, vol. 9, pp. 2677-2694, 2008.- [128] S. García, A. Fernández, J. Luengo, and F. Herrera, "Advanced Nonparametric Tests for Multiple Comparisons in the Design of Experiments in Computational Intelligence and Data Mining: Experimental Analysis of Power,"
Information Sciences, vol. 180, no. 10, pp. 2044-2064, 2010.- [129] F. Wilcoxon, "Individual Comparisons by Ranking Methods,"
Biometrics, vol. 1, pp. 80-83, 1945. |