| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
A Compact and Accurate Model for Classification
February 2004 (vol. 16 no. 2)
pp. 203-215
Abstract—We describe and evaluate an information-theoretic algorithm for data-driven induction of classification models based on a minimal subset of available features. The relationship between input (predictive) features and the target (classification) attribute is modeled by a tree-like structure termed an information network (IN). Unlike other decision-tree models, the information network uses the same input attribute across the nodes of a given layer (level). The input attributes are selected incrementally by the algorithm to maximize a global decrease in the conditional entropy of the target attribute. We are using the prepruning approach: When no attribute causes a statistically significant decrease in the entropy, the network construction is stopped. The algorithm is shown empirically to produce much more compact models than other methods of decision-tree learning while preserving nearly the same level of classification accuracy.
[1] 203 F. Attneave, Applications of Information Theory to Psychology. Holt, Rinehart, and Winston, 1959.[2] C.L. Blake and C.J. Merz , UCI Repository of Machine Learning Databases,http://www.ics.uci.edu/~mlearnMLRepository. html , 19 July 2002.[3] L. Breiman, J.H. Friedman, R.A. Olshen, and P.J. Stone, Classification and Regression Trees. Wadsworth, 1984.[4] T.M. Cover and J.A. Thomas, Elements of Information Theory. Wiley, 1991.[5] P. Domingos and M. Pazzani, On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Machine Learning, no. 29, pp. 103-130, 1997.[6] P. Domingos, Occam's Two Razors: The Sharp and the Blunt Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 37-43, 1998.[7] U. Fayyad and K. Irani, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.[8] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, From Data Mining to Knowledge Discovery: An Overview Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. 1-36, AAAI/MIT Press, 1996.[9] A.L. Gorin, S.E. Levinson, A.N. Gertner, and E. Goldman, Adaptive Acquisition of Language Computer Speech and Language, vol. 5, no. 2, pp. 101-132, 1991.[10] A.L. Gorin, S.E. Levinson, and A. Sankar, An Experiment in Spoken Language Acquisition IEEE Trans. Speech and Audio Processing, vol. 2, no. 1, pp. 224-239, 1994.[11] G.H. John, R. Kohavi, and K. Pfleger, Irrelevant Features and the Subset Selection Problem Proc. 11th Int'l Conf. Machine Learning, pp. 121-129, 1994.[12] M. Last, Y. Klein, and A. Kandel, Knowledge Discovery in Time Series Databases IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 31, no. 1, pp. 160-169, Feb. 2001.[13] M. Last, A. Kandel, and O. Maimon, Information-Theoretic Algorithm for Feature Selection Pattern Recognition Letters, vol. 22, nos. 6-7, pp. 799-811, 2001.[14] M. Last and A. Kandel, Data Mining for Process and Quality Control in the Semiconductor Industry Data Mining for Design and Manufacturing: Methods and Applications, D. Braha, ed., pp. 207-234, Boston: Kluwer Academic, 2001.[15] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic, 1998.[16] O. Maimon and M. Last, Knowledge Discovery and Data Mining, The Info-Fuzzy Network (IFN) Methodology. Boston: Kluwer Academic, 2001.[17] E.W. Minium, R.B. Clarke, and T. Coladarci, Elements of Statistical Reasoning. New York: Wiley, 1999.[18] T.M. Mitchell, Machine Learning. McGraw-Hill, 1997.[19] J.R. Quinlan, Induction of Decision Trees Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.[20] J.R. Quinlan, Simplifying Decision Trees Int'l J. Man-Machine Studies, no. 27, pp. 221-234, 1987.[21] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.[22] J.R. Quinlan, Improved Use of Continuous Attributes in C4.5 J. Artificial Intelligence Research, no. 4, pp. 77-90, 1996.[23] C.R. Rao and H. Toutenburg, Linear Models: Least Squares and Alternatives. Springer-Verlag, 1995.[24] R. Rastogi and K. Shim, PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning Proc. 24th Int'l Conf.Very Large Databases (VLDB '98), pp. 404-415, 1998.[25] P. Smyth and R. Goodman, "An Information Theoretic Approach to Rule Induction from Databases," IEEE Trans Knowledge and Data Eng., vol. 4, no. 4, pp. 301-316, Aug. 1992.
Index Terms:
Knowledge discovery in databases, data mining, classification, dimensionality reduction, feature selection, decision trees, information theory, Information theoretic network.
Citation:
Mark Last, Oded Maimon, "A Compact and Accurate Model for Classification," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 2, pp. 203-215, Feb. 2004, doi:10.1109/TKDE.2004.1269598