This Article 
 Bibliographic References 
 Add to: 
An Information Theoretic Approach to Rule Induction from Databases
August 1992 (vol. 4 no. 4)
pp. 301-316

An algorithm for the induction of rules from examples is introduced. The algorithm is novel in the sense that it not only learns rules for a given concept (classification), but it simultaneously learns rules relating multiple concepts. This type of learning, known as generalized rule induction, is considerably more general than existing algorithms, which tend to be classification oriented. Initially, it is focused on the problem of determining a quantitative, well-defined rule preference measure. In particular, a quantity called the J-measure is proposed as an information-theoretic alternative to existing approaches. The J-measure quantifies the information content of a rule or a hypothesis. The information theoretic origins of this measure are outlined, and its plausibility as a hypothesis preference measure is examined. The ITRULE algorithm, which uses the measure to learn a set of optimal rules from a set of data samples, is defined. Experimental results on real-world data are analyzed.

[1] N. E. Johnson, "Mediating representations in knowledge elicitation," inProc First European Workshop on Knowledge Acquisition for Knowledge-Based Systems, Reading, England, 1987.
[2] P. E. Johnson, "What kind of expert system should a system be?,"J. Med. Philosophy, vol. 8, pp. 77-97, 1983.
[3] A. Hart,Knowledge Acquisition for Expert Systems.New York: Mc-Graw Hill, 1986.
[4] D. Kahneman, P. Slovic, and A. Tversky,Judgement under Uncertainty: Heuristics and Biases.Cambridge, England: Cambridge University, 1982.
[5] Y. Bishop, S. E. Fienberg, and P. W. Holland,Discrete Multivariate Analysis: Theory and Practice.Cambridge, MA: MIT, 1975.
[6] A. K. C. Wong and D. K. Y. Chiu, "Synthesizing statistical knowledge from incomplete mixed-mode data,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-9, no. 6, pp. 796-805, Nov. 1987.
[7] D. V. Lindley, "Scoring rules and the inevitability of probability,"Int. Statist. Rev., vol. 50, pp. 1-26, Jan. 1986.
[8] P. Cheeseman, "In defense of probability," inProc. Ninth Int. Joint Conf. on Artificial Intelligence., vol. 2, 1985, pp. 1002-1009.
[9] R. M. Goodman and P. Smyth, "An information-theoretic model for rule-based expert systems," presented at the 1988 Int. Symp. on Information Theory, Kobe, Japan, 1988.
[10] R. O. Duda and P. E. Hart,Pattern Classification and Scene Analysis, New York: Wiley, 1973.
[11] E. M. Gold, "Language identification in the limit,"Inform. Control, 10, pp. 447-474, 1967.
[12] L. G. Valiant, "A theory of the learnable,"Comm. ACM, vol. 27, pp. 1134-1142, Nov. 1984.
[13] D. Haussler, "Bias, version spaces and Valiant's learning framework,"Proc. Fourth Int. Workshop on Machine Learning, 1987, pp. 324-336.
[14] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representation by error propagation,"Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vols. 1 and 2. Cambridge, MA: MIT Press, 1986.
[15] P. R. Cohen and E. A. Feigenbaum,The Handbook of Artificial Intelligence (Vol. 3).Los Altos, CA: William Kaufmann, 1982.
[16] T. M. Mitchell, "Generalization as search,"Artif. Intell., vol. 18, no. 2, pp. 203-226, 1982.
[17] R. S. Michalski and J. B. Larson, "Selection of most representative training examples and incremental generation of VL1 hypotheses," Rep. 867, Computer Science Department, Univ. of Illinois, 1978.
[18] R. S. Michalski and R. L. Chilausky, "Learning by being told and learning from examples",Int. J. Policy Anal. Inform. Syst., vol. 4, pp. 125-161, 1980.
[19] J. R. Quinlan, "Induction of decision trees,"Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[20] J. R. Quinlan and R. L. Rivest, "Inferring decision trees using the minimum description length principle,"Inform. Computat., vol 80, pp. 227-248, 1989.
[21] B. Arbab and D. Michic, "Generating rules from examples," inProc. Ninth Int. Joint Conf. on Artificial Intelligence, 1985, pp. 631-633.
[22] R. Goodman and P. Smyth, "Decision tree design from a communication theory standpoint,"IEEE Trans. Inform. Theory, vol. IT-34, pp. 979-994, 1988.
[23] R. M. Goodman and P. Smyth, "Decision tree design using information theory,"Knowledge Acquisition, vol. 2, pp. 1-19, 1990.
[24] B. R. Gaines and M. L. G. Shaw, "Introduction of inference rules for expert systems,"Fuzzy Sets and Systems, vol. 18, pp. 315-328, Amsterdam: Elsevier, 1986.
[25] J. Boose, "Personal construct theory and the transfer of expertise," inProc. AAAI, 1984, pp. 27-33.
[26] J. G. Ganascia, "Learning with Hilbert cubes," inProc. Second European Workshop on Machine Learning (EWSL), Bled, Yugoslavia, 1987.
[27] J. R. Quinlan, "Generating production rules from examples," inProc. Tenth Int. Joint Conf. on Artificial Intelligence, 1987, pp. 304-307.
[28] J. Cendrowska, "PRISM: An algorithm for Inducing modular rules,"Int. J. Man-Machine Studies, vol. 27, pp. 349-370, 1987.
[29] P. Clark and T. Niblett, "The CN2 induction algorithm,"Machine Learning, vol. 3, pp. 261-283, 1989.
[30] R. L. Rivest, "Learning decision lists,"Mach. Learning, vol. 2, pp. 229-246, 1987.
[31] P. Cheeseman, "Learning of expert systems from data," inProc. First IEEE Conf. on Applications of Artificial Intelligence, 1984.
[32] C. E. Shannon, "A mathematical theory of communication,"Bell Syst. Tech. J., vol. 27, no. 3, pp. 379-423, July 1948.
[33] N. M. Blachman, "The amount of information that y gives about X,"IEEE Trans. Inform. Theory, vol. IT-14, pp. 27-31, Jan. 1968.
[34] P. Smyth and R. M. Goodman, "The information content of a probabilistic rule," to be published.
[35] J. E. Shore and R. W. Johnson, "Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,"IEEE Trans. Inform. Theory, vol. IT-26, pp. 26-37, Jan. 1980.
[36] S. Kullback,Information Theory and Statistics.New York: Wiley, 1959.
[37] R. E. Blahut,Principles and Practice of Information Theory. Reading, MA: Addison-Wesley, 1987.
[38] D. Angluin and C. H. Smith, "Inductive inference theory and methods,"ACM Comput. Surveys, vol. 15, pp. 237-269, 1983.
[39] B. R. Gaines, "Behavior/structure transformations under uncertainty,"Int. J. Man-Mach. Stud., vol. 8, pp. 337-365, 1976.
[40] R. S. Michalski, "Pattern recognition as rule-guided inference,"IEEE Trans¿Pattern Anal. Mach. Intell., vol. PAMI-2, pp. 349-361, July 1980.
[41] J. H. Holland, K. F. Holyoak, R. E. Nisbett, and P. R. Thagard,Induction: Processes of Inference, Learning, and Discovery. Cambrigde, MA: M.I.T. Press, 1986.
[42] I. J. Good, "The estimation of probabilities: An essay on modern Bayesian methods," Res. Monograph 30, MIT, Cambridge: MA, 1965.
[43] W. Feller,An Introduction to Probability Theory and its Applications, New York: Wiley, vol. 1, 1968.
[44] American Association of Investors,The Individual Investor's Guide to No-load Mutual Funds.Chicago, IL: International, 1987.
[45] Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Washington DC, 1985.
[46] J. C. Schlimmer, "Concept acquisition through representational adjustment," Ph.D. dissertation, Dep. Comput. Sci., Univ. of California at Irvine, CA, 1987.
[47] J. R. Quinlan, "Discovering rules by induction from large collections of examples," inExpert Systems in the Micro-electronic Age, D. Michie, ed. Edinburgh, Scotland: Edinburgh University, 1979.
[48] R. M. Goodman, J. W. Miller, and P. Smyth, "The information provided by a linear threshold function with binary weights," presented at the 1990 IEEE Int. Symp. Information Theory, San Diego, CA, Jan. 1990.

Index Terms:
information theoretic approach; rule induction from databases; multiple concepts; learning; generalized rule induction; rule preference measure; J-measure; hypothesis preference measure; ITRULE algorithm; database management systems; expert systems; information theory; knowledge acquisition; learning systems
P. Smyth, R.M. Goodman, "An Information Theoretic Approach to Rule Induction from Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 4, no. 4, pp. 301-316, Aug. 1992, doi:10.1109/69.149926
Usage of this product signifies your acceptance of the Terms of Use.