Issue No. 02 - February (2004 vol. 16)
Mark Last , IEEE Computer Society
<p><b>Abstract</b>—We describe and evaluate an information-theoretic algorithm for data-driven induction of classification models based on a minimal subset of available features. The relationship between input (predictive) features and the target (classification) attribute is modeled by a tree-like structure termed an <it>information network</it> (<it>IN</it>). Unlike other decision-tree models, the information network uses the same input attribute across the nodes of a given layer (level). The input attributes are selected incrementally by the algorithm to maximize a global decrease in the conditional entropy of the target attribute. We are using the prepruning approach: When no attribute causes a statistically significant decrease in the entropy, the network construction is stopped. The algorithm is shown empirically to produce much more compact models than other methods of decision-tree learning while preserving nearly the same level of classification accuracy.</p>
Knowledge discovery in databases, data mining, classification, dimensionality reduction, feature selection, decision trees, information theory, Information theoretic network.
M. Last and O. Maimon, "A Compact and Accurate Model for Classification," in IEEE Transactions on Knowledge & Data Engineering, vol. 16, no. , pp. 203-215, 2004.