Issue No. 08 - Aug. (2012 vol. 24)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.101
Khurram Shehzad , University of Engineering and Technology, Taxila
Abstract—Discretization is a critical component of data mining whereby continuous attributes of a data set are converted into discrete ones by creating intervals either before or during learning. There are many good reasons for preprocessing discretization, such as increased learning efficiency and classification accuracy, comprehensibility of data mining results, as well as the inherent limitation of a great majority of learning algorithms to handle only discrete data. Many preprocessing discretization techniques have been proposed to date, of which the Entropy-MDLP discretization has been accepted as by far the most effective in the context of both decision tree learning and rule induction algorithms. This paper presents a new discretization technique EDISC which utilizes the entropy-based principle but takes a class-tailored approach to discretization. The technique is applicable in general to any covering algorithm, including those that use the class-per-class rule induction methodology such as CN2 as well as those that use a seed example during the learning phase, such as the RULES family. Experimental evaluation has proved the efficiency and effectiveness of the technique as a preprocessing discretization procedure for CN2 as well as RULES-7, the latest algorithm among the RULES family of inductive learning algorithms.
Discretization, continuous values, discrete values, data transformation, data mining, machine learning, inductive learning, supervised learning, rule induction.
K. Shehzad, "EDISC: A Class-Tailored Discretization Technique for Rule-Based Classification," in IEEE Transactions on Knowledge & Data Engineering, vol. 24, no. , pp. 1435-1447, 2011.