This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Compact and Accurate Model for Classification
February 2004 (vol. 16 no. 2)
pp. 203-215
Mark Last, IEEE Computer Society

Abstract—We describe and evaluate an information-theoretic algorithm for data-driven induction of classification models based on a minimal subset of available features. The relationship between input (predictive) features and the target (classification) attribute is modeled by a tree-like structure termed an information network (IN). Unlike other decision-tree models, the information network uses the same input attribute across the nodes of a given layer (level). The input attributes are selected incrementally by the algorithm to maximize a global decrease in the conditional entropy of the target attribute. We are using the prepruning approach: When no attribute causes a statistically significant decrease in the entropy, the network construction is stopped. The algorithm is shown empirically to produce much more compact models than other methods of decision-tree learning while preserving nearly the same level of classification accuracy.

[1] F. Attneave, Applications of Information Theory to Psychology. Holt, Rinehart, and Winston, 1959.
[2] C.L. Blake and C.J. Merz , UCI Repository of Machine Learning Databases,http://www.ics.uci.edu/~mlearnMLRepository. html , 19 July 2002.
[3] L. Breiman, J.H. Friedman, R.A. Olshen, and P.J. Stone, Classification and Regression Trees. Wadsworth, 1984.
[4] T.M. Cover and J.A. Thomas, Elements of Information Theory. Wiley, 1991.
[5] P. Domingos and M. Pazzani, On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Machine Learning, no. 29, pp. 103-130, 1997.
[6] P. Domingos, Occam's Two Razors: The Sharp and the Blunt Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 37-43, 1998.
[7] U. Fayyad and K. Irani, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.
[8] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, From Data Mining to Knowledge Discovery: An Overview Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. 1-36, AAAI/MIT Press, 1996.
[9] A.L. Gorin, S.E. Levinson, A.N. Gertner, and E. Goldman, Adaptive Acquisition of Language Computer Speech and Language, vol. 5, no. 2, pp. 101-132, 1991.
[10] A.L. Gorin, S.E. Levinson, and A. Sankar, An Experiment in Spoken Language Acquisition IEEE Trans. Speech and Audio Processing, vol. 2, no. 1, pp. 224-239, 1994.
[11] G.H. John, R. Kohavi, and K. Pfleger, Irrelevant Features and the Subset Selection Problem Proc. 11th Int'l Conf. Machine Learning, pp. 121-129, 1994.
[12] M. Last, Y. Klein, and A. Kandel, Knowledge Discovery in Time Series Databases IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 31, no. 1, pp. 160-169, Feb. 2001.
[13] M. Last, A. Kandel, and O. Maimon, Information-Theoretic Algorithm for Feature Selection Pattern Recognition Letters, vol. 22, nos. 6-7, pp. 799-811, 2001.
[14] M. Last and A. Kandel, Data Mining for Process and Quality Control in the Semiconductor Industry Data Mining for Design and Manufacturing: Methods and Applications, D. Braha, ed., pp. 207-234, Boston: Kluwer Academic, 2001.
[15] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic, 1998.
[16] O. Maimon and M. Last, Knowledge Discovery and Data Mining, The Info-Fuzzy Network (IFN) Methodology. Boston: Kluwer Academic, 2001.
[17] E.W. Minium, R.B. Clarke, and T. Coladarci, Elements of Statistical Reasoning. New York: Wiley, 1999.
[18] T.M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[19] J.R. Quinlan, Induction of Decision Trees Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[20] J.R. Quinlan, Simplifying Decision Trees Int'l J. Man-Machine Studies, no. 27, pp. 221-234, 1987.
[21] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[22] J.R. Quinlan, Improved Use of Continuous Attributes in C4.5 J. Artificial Intelligence Research, no. 4, pp. 77-90, 1996.
[23] C.R. Rao and H. Toutenburg, Linear Models: Least Squares and Alternatives. Springer-Verlag, 1995.
[24] R. Rastogi and K. Shim, PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning Proc. 24th Int'l Conf.Very Large Databases (VLDB '98), pp. 404-415, 1998.
[25] P. Smyth and R. Goodman, "An Information Theoretic Approach to Rule Induction from Databases," IEEE Trans Knowledge and Data Eng., vol. 4, no. 4, pp. 301-316, Aug. 1992.

Index Terms:
Knowledge discovery in databases, data mining, classification, dimensionality reduction, feature selection, decision trees, information theory, Information theoretic network.
Citation:
Mark Last, Oded Maimon, "A Compact and Accurate Model for Classification," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 2, pp. 203-215, Feb. 2004, doi:10.1109/TKDE.2004.1269598
Usage of this product signifies your acceptance of the Terms of Use.