
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
P. Smyth, R.M. Goodman, "An Information Theoretic Approach to Rule Induction from Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 4, no. 4, pp. 301316, August, 1992.  
BibTex  x  
@article{ 10.1109/69.149926, author = {P. Smyth and R.M. Goodman}, title = {An Information Theoretic Approach to Rule Induction from Databases}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {4}, number = {4}, issn = {10414347}, year = {1992}, pages = {301316}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.149926}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  An Information Theoretic Approach to Rule Induction from Databases IS  4 SN  10414347 SP301 EP316 EPD  301316 A1  P. Smyth, A1  R.M. Goodman, PY  1992 KW  information theoretic approach; rule induction from databases; multiple concepts; learning; generalized rule induction; rule preference measure; Jmeasure; hypothesis preference measure; ITRULE algorithm; database management systems; expert systems; information theory; knowledge acquisition; learning systems VL  4 JA  IEEE Transactions on Knowledge and Data Engineering ER   
An algorithm for the induction of rules from examples is introduced. The algorithm is novel in the sense that it not only learns rules for a given concept (classification), but it simultaneously learns rules relating multiple concepts. This type of learning, known as generalized rule induction, is considerably more general than existing algorithms, which tend to be classification oriented. Initially, it is focused on the problem of determining a quantitative, welldefined rule preference measure. In particular, a quantity called the Jmeasure is proposed as an informationtheoretic alternative to existing approaches. The Jmeasure quantifies the information content of a rule or a hypothesis. The information theoretic origins of this measure are outlined, and its plausibility as a hypothesis preference measure is examined. The ITRULE algorithm, which uses the measure to learn a set of optimal rules from a set of data samples, is defined. Experimental results on realworld data are analyzed.
[1] N. E. Johnson, "Mediating representations in knowledge elicitation," inProc First European Workshop on Knowledge Acquisition for KnowledgeBased Systems, Reading, England, 1987.
[2] P. E. Johnson, "What kind of expert system should a system be?,"J. Med. Philosophy, vol. 8, pp. 7797, 1983.
[3] A. Hart,Knowledge Acquisition for Expert Systems.New York: McGraw Hill, 1986.
[4] D. Kahneman, P. Slovic, and A. Tversky,Judgement under Uncertainty: Heuristics and Biases.Cambridge, England: Cambridge University, 1982.
[5] Y. Bishop, S. E. Fienberg, and P. W. Holland,Discrete Multivariate Analysis: Theory and Practice.Cambridge, MA: MIT, 1975.
[6] A. K. C. Wong and D. K. Y. Chiu, "Synthesizing statistical knowledge from incomplete mixedmode data,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI9, no. 6, pp. 796805, Nov. 1987.
[7] D. V. Lindley, "Scoring rules and the inevitability of probability,"Int. Statist. Rev., vol. 50, pp. 126, Jan. 1986.
[8] P. Cheeseman, "In defense of probability," inProc. Ninth Int. Joint Conf. on Artificial Intelligence., vol. 2, 1985, pp. 10021009.
[9] R. M. Goodman and P. Smyth, "An informationtheoretic model for rulebased expert systems," presented at the 1988 Int. Symp. on Information Theory, Kobe, Japan, 1988.
[10] R. O. Duda and P. E. Hart,Pattern Classification and Scene Analysis, New York: Wiley, 1973.
[11] E. M. Gold, "Language identification in the limit,"Inform. Control, 10, pp. 447474, 1967.
[12] L. G. Valiant, "A theory of the learnable,"Comm. ACM, vol. 27, pp. 11341142, Nov. 1984.
[13] D. Haussler, "Bias, version spaces and Valiant's learning framework,"Proc. Fourth Int. Workshop on Machine Learning, 1987, pp. 324336.
[14] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representation by error propagation,"Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vols. 1 and 2. Cambridge, MA: MIT Press, 1986.
[15] P. R. Cohen and E. A. Feigenbaum,The Handbook of Artificial Intelligence (Vol. 3).Los Altos, CA: William Kaufmann, 1982.
[16] T. M. Mitchell, "Generalization as search,"Artif. Intell., vol. 18, no. 2, pp. 203226, 1982.
[17] R. S. Michalski and J. B. Larson, "Selection of most representative training examples and incremental generation of VL1 hypotheses," Rep. 867, Computer Science Department, Univ. of Illinois, 1978.
[18] R. S. Michalski and R. L. Chilausky, "Learning by being told and learning from examples",Int. J. Policy Anal. Inform. Syst., vol. 4, pp. 125161, 1980.
[19] J. R. Quinlan, "Induction of decision trees,"Machine Learning, vol. 1, no. 1, pp. 81106, 1986.
[20] J. R. Quinlan and R. L. Rivest, "Inferring decision trees using the minimum description length principle,"Inform. Computat., vol 80, pp. 227248, 1989.
[21] B. Arbab and D. Michic, "Generating rules from examples," inProc. Ninth Int. Joint Conf. on Artificial Intelligence, 1985, pp. 631633.
[22] R. Goodman and P. Smyth, "Decision tree design from a communication theory standpoint,"IEEE Trans. Inform. Theory, vol. IT34, pp. 979994, 1988.
[23] R. M. Goodman and P. Smyth, "Decision tree design using information theory,"Knowledge Acquisition, vol. 2, pp. 119, 1990.
[24] B. R. Gaines and M. L. G. Shaw, "Introduction of inference rules for expert systems,"Fuzzy Sets and Systems, vol. 18, pp. 315328, Amsterdam: Elsevier, 1986.
[25] J. Boose, "Personal construct theory and the transfer of expertise," inProc. AAAI, 1984, pp. 2733.
[26] J. G. Ganascia, "Learning with Hilbert cubes," inProc. Second European Workshop on Machine Learning (EWSL), Bled, Yugoslavia, 1987.
[27] J. R. Quinlan, "Generating production rules from examples," inProc. Tenth Int. Joint Conf. on Artificial Intelligence, 1987, pp. 304307.
[28] J. Cendrowska, "PRISM: An algorithm for Inducing modular rules,"Int. J. ManMachine Studies, vol. 27, pp. 349370, 1987.
[29] P. Clark and T. Niblett, "The CN2 induction algorithm,"Machine Learning, vol. 3, pp. 261283, 1989.
[30] R. L. Rivest, "Learning decision lists,"Mach. Learning, vol. 2, pp. 229246, 1987.
[31] P. Cheeseman, "Learning of expert systems from data," inProc. First IEEE Conf. on Applications of Artificial Intelligence, 1984.
[32] C. E. Shannon, "A mathematical theory of communication,"Bell Syst. Tech. J., vol. 27, no. 3, pp. 379423, July 1948.
[33] N. M. Blachman, "The amount of information that y gives about X,"IEEE Trans. Inform. Theory, vol. IT14, pp. 2731, Jan. 1968.
[34] P. Smyth and R. M. Goodman, "The information content of a probabilistic rule," to be published.
[35] J. E. Shore and R. W. Johnson, "Axiomatic derivation of the principle of maximum entropy and the principle of minimum crossentropy,"IEEE Trans. Inform. Theory, vol. IT26, pp. 2637, Jan. 1980.
[36] S. Kullback,Information Theory and Statistics.New York: Wiley, 1959.
[37] R. E. Blahut,Principles and Practice of Information Theory. Reading, MA: AddisonWesley, 1987.
[38] D. Angluin and C. H. Smith, "Inductive inference theory and methods,"ACM Comput. Surveys, vol. 15, pp. 237269, 1983.
[39] B. R. Gaines, "Behavior/structure transformations under uncertainty,"Int. J. ManMach. Stud., vol. 8, pp. 337365, 1976.
[40] R. S. Michalski, "Pattern recognition as ruleguided inference,"IEEE Trans¿Pattern Anal. Mach. Intell., vol. PAMI2, pp. 349361, July 1980.
[41] J. H. Holland, K. F. Holyoak, R. E. Nisbett, and P. R. Thagard,Induction: Processes of Inference, Learning, and Discovery. Cambrigde, MA: M.I.T. Press, 1986.
[42] I. J. Good, "The estimation of probabilities: An essay on modern Bayesian methods," Res. Monograph 30, MIT, Cambridge: MA, 1965.
[43] W. Feller,An Introduction to Probability Theory and its Applications, New York: Wiley, vol. 1, 1968.
[44] American Association of Investors,The Individual Investor's Guide to Noload Mutual Funds.Chicago, IL: International, 1987.
[45] Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Washington DC, 1985.
[46] J. C. Schlimmer, "Concept acquisition through representational adjustment," Ph.D. dissertation, Dep. Comput. Sci., Univ. of California at Irvine, CA, 1987.
[47] J. R. Quinlan, "Discovering rules by induction from large collections of examples," inExpert Systems in the Microelectronic Age, D. Michie, ed. Edinburgh, Scotland: Edinburgh University, 1979.
[48] R. M. Goodman, J. W. Miller, and P. Smyth, "The information provided by a linear threshold function with binary weights," presented at the 1990 IEEE Int. Symp. Information Theory, San Diego, CA, Jan. 1990.