This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning
April 2013 (vol. 25 no. 4)
pp. 734-750
Salvador Garcia, Dept. of Comput. Sci., Univ. of Jaen, Jaen, Spain
J. Luengo, Dept. of Civil Eng., Univ. of Burgos, Burgos, Spain
José Antonio Sáez, Dept. of Comput. Sci. & Artificial Intell., Univ. of Granada, Granada, Spain
Victoria López, Dept. of Comput. Sci. & Artificial Intell., Univ. of Granada, Granada, Spain
F. Herrera, Dept. of Comput. Sci. & Artificial Intell., Univ. of Granada, Granada, Spain
Discretization is an essential preprocessing technique used in many knowledge discovery and data mining tasks. Its main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data. In this manner, symbolic data mining algorithms can be applied over continuous data and the representation of information is simplified, making it more concise and specific. The literature provides numerous proposals of discretization and some attempts to categorize them into a taxonomy can be found. However, in previous papers, there is a lack of consensus in the definition of the properties and no formal categorization has been established yet, which may be confusing for practitioners. Furthermore, only a small set of discretizers have been widely considered, while many other methods have gone unnoticed. With the intention of alleviating these problems, this paper provides a survey of discretization methods proposed in the literature from a theoretical and empirical perspective. From the theoretical perspective, we develop a taxonomy based on the main properties pointed out in previous research, unifying the notation and including all the known methods up to date. Empirically, we conduct an experimental study in supervised classification involving the most representative and newest discretizers, different types of classifiers, and a large number of data sets. The results of their performances measured in terms of accuracy, number of intervals, and inconsistency have been verified by means of nonparametric statistical tests. Additionally, a set of discretizers are highlighted as the best performing ones.
Index Terms:
Taxonomy,Delta modulation,Heuristic algorithms,Merging,Algorithm design and analysis,Supervised learning,Electronic mail,classification,Discretization,continuous attributes,decision trees,taxonomy,data preprocessing,data mining
Citation:
Salvador Garcia, J. Luengo, José Antonio Sáez, Victoria López, F. Herrera, "A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 4, pp. 734-750, April 2013, doi:10.1109/TKDE.2012.35
Usage of this product signifies your acceptance of the Terms of Use.