This Article 
 Bibliographic References 
 Add to: 
A Compact and Accurate Model for Classification
February 2004 (vol. 16 no. 2)
pp. 203-215
Mark Last, IEEE Computer Society

Abstract—We describe and evaluate an information-theoretic algorithm for data-driven induction of classification models based on a minimal subset of available features. The relationship between input (predictive) features and the target (classification) attribute is modeled by a tree-like structure termed an information network (IN). Unlike other decision-tree models, the information network uses the same input attribute across the nodes of a given layer (level). The input attributes are selected incrementally by the algorithm to maximize a global decrease in the conditional entropy of the target attribute. We are using the prepruning approach: When no attribute causes a statistically significant decrease in the entropy, the network construction is stopped. The algorithm is shown empirically to produce much more compact models than other methods of decision-tree learning while preserving nearly the same level of classification accuracy.

[1] F. Attneave, Applications of Information Theory to Psychology. Holt, Rinehart, and Winston, 1959.
[2] C.L. Blake and C.J. Merz , UCI Repository of Machine Learning Databases, html , 19 July 2002.
[3] L. Breiman, J.H. Friedman, R.A. Olshen, and P.J. Stone, Classification and Regression Trees. Wadsworth, 1984.
[4] T.M. Cover and J.A. Thomas, Elements of Information Theory. Wiley, 1991.
[5] P. Domingos and M. Pazzani, On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Machine Learning, no. 29, pp. 103-130, 1997.
[6] P. Domingos, Occam's Two Razors: The Sharp and the Blunt Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 37-43, 1998.
[7] U. Fayyad and K. Irani, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.
[8] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, From Data Mining to Knowledge Discovery: An Overview Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. 1-36, AAAI/MIT Press, 1996.
[9] A.L. Gorin, S.E. Levinson, A.N. Gertner, and E. Goldman, Adaptive Acquisition of Language Computer Speech and Language, vol. 5, no. 2, pp. 101-132, 1991.
[10] A.L. Gorin, S.E. Levinson, and A. Sankar, An Experiment in Spoken Language Acquisition IEEE Trans. Speech and Audio Processing, vol. 2, no. 1, pp. 224-239, 1994.
[11] G.H. John, R. Kohavi, and K. Pfleger, Irrelevant Features and the Subset Selection Problem Proc. 11th Int'l Conf. Machine Learning, pp. 121-129, 1994.
[12] M. Last, Y. Klein, and A. Kandel, Knowledge Discovery in Time Series Databases IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 31, no. 1, pp. 160-169, Feb. 2001.
[13] M. Last, A. Kandel, and O. Maimon, Information-Theoretic Algorithm for Feature Selection Pattern Recognition Letters, vol. 22, nos. 6-7, pp. 799-811, 2001.
[14] M. Last and A. Kandel, Data Mining for Process and Quality Control in the Semiconductor Industry Data Mining for Design and Manufacturing: Methods and Applications, D. Braha, ed., pp. 207-234, Boston: Kluwer Academic, 2001.
[15] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic, 1998.
[16] O. Maimon and M. Last, Knowledge Discovery and Data Mining, The Info-Fuzzy Network (IFN) Methodology. Boston: Kluwer Academic, 2001.
[17] E.W. Minium, R.B. Clarke, and T. Coladarci, Elements of Statistical Reasoning. New York: Wiley, 1999.
[18] T.M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[19] J.R. Quinlan, Induction of Decision Trees Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[20] J.R. Quinlan, Simplifying Decision Trees Int'l J. Man-Machine Studies, no. 27, pp. 221-234, 1987.
[21] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[22] J.R. Quinlan, Improved Use of Continuous Attributes in C4.5 J. Artificial Intelligence Research, no. 4, pp. 77-90, 1996.
[23] C.R. Rao and H. Toutenburg, Linear Models: Least Squares and Alternatives. Springer-Verlag, 1995.
[24] R. Rastogi and K. Shim, PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning Proc. 24th Int'l Conf.Very Large Databases (VLDB '98), pp. 404-415, 1998.
[25] P. Smyth and R. Goodman, "An Information Theoretic Approach to Rule Induction from Databases," IEEE Trans Knowledge and Data Eng., vol. 4, no. 4, pp. 301-316, Aug. 1992.

Index Terms:
Knowledge discovery in databases, data mining, classification, dimensionality reduction, feature selection, decision trees, information theory, Information theoretic network.
Mark Last, Oded Maimon, "A Compact and Accurate Model for Classification," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 2, pp. 203-215, Feb. 2004, doi:10.1109/TKDE.2004.1269598
Usage of this product signifies your acceptance of the Terms of Use.