This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Systems for Knowledge Discovery in Databases
December 1993 (vol. 5 no. 6)
pp. 903-913

Knowledge-discovery systems face challenging problems from real-world databases, which tend to be dynamic, incomplete, redundant, noisy, sparse, and very large. These problems are addressed and some techniques for handling them are described. A model of an idealized knowledge-discovery system is presented as a reference for studying and designing new systems. This model is used in the comparison of three systems: CoverStory, EXPLORA, and the Knowledge Discovery Workbench. The deficiencies of existing systems relative to the model reveal several open problems for future research.

[1] R. Agrawal, T. Imielinski, and Arun Swami, "Database mining: A performance perspective,"IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[2] H. Almuallim and T. G. Dietterich, "Learning with many irrelevant features, " inProc. AAAI 91. pp. 547-552, 1991.
[3] T. Anand and G. Kahn, "SPOTLIGHT: A data explanation system," inProc. Eighth IEEE Conf. Appl. AI, 1992.
[4] W. Buntine, "Stratifying samples to improve learning," inKnowledge Discovery in Databases, G. Piatetsky-Shapiro and W. J. Frawley, eds. Cambridge, MA: AAAI/MIT, 1991, pp. 305-314.
[5] Y. Cai, N. Cercone, and J. Han, "Learning characteristic rules from relational databases," inComputational Intelligence II. Vol. II, 2nd ed., Gardin and G. Mauri, eds. New York: Elsevier, 1990. 187- 196.
[6] G. Cooper and E. Herskovits, "A Bayesian method for the induction of probabilistic networks from data,"Technical Report KSL-91-02, Knowledge Systems Laboratory, Stanford University, Stanford, CA. 1991.
[7] C.J. Date,An Introduction to Database Systems, Vol. II, Addison-Wesley Publishing Co., Reading, Mass., 1983.
[8] V. Dhar and A. Tuzhilin, "Abstract-driven pattern discovery in databases,"IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[9] W. J. Dixon and F. J. Massey,Introduction to Statistical Analysis. New York: McGraw-Hill, 1979.
[10] R. O. Duda and P. E. Hart,Pattern Classification and Scene Analysis. New York: Wiley, 1973.
[11] G. Dunn and B. S. Everitt,An Introduction to Mathematical Taxonomy. Cambridge, MA: MIT, 1982.
[12] S. Dzeroski and N. Lavrac, "Inductive learning in deductive databases,"IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[13] D.H. Fisher, M.J. Pazzani, and P. Langley,Concept Formation: Knowledge and Experience in Unsupervised Learning, Morgan Kaufmann, San Francisco, Calif., 1991.
[14] W.J. Frawley, G. Piatetsky-Shapiro, and C.J. Matheus, "Knowledge Discovery in Databases: An Overview,"AI Magazine, Vol. 13, No. 3, 1992, pp. 57-70.
[15] C. Glymour, R. Scheines, P. Spirtes. and K. Kelly.Discovering Causal Structure. New York: Academic, 1987.
[16] J. Han, Y. Hwang, and N. Cercone, "Intelligent query answering using discovered knowledge,"IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[17] L. B. Holder and D. J. Cook, "Discovery of inexact concepts from structural data,"IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[18] J. H. Holland, K. F. Holyoak, R. E. Nisbett, and P. R. Thagard,Induction: Processes of Inference, Learning, and Discovery. Cambrigde, MA: M.I.T. Press, 1986.
[19] P. Hoschka and W. Klosgen, "A support system for interpreting statistical data," inKnowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley, eds. Cambridge, MA: AAAI/MIT, 1991, pp. 325-345.
[20] K. A. Kaufman, R. S. Michalski, and L. Kerschberg, "Mining for knowledge in databases: Goals and general description of the INLEN system, " inKnowledge Discovery in Databases. Cambridge, MA: AAAI/MIT, 1991.
[21] W. Klosgen, "Visualization and adaptivity in the statistics interpreter EXPLORA," inWorkshop Notes from the 9th Nat. Conf. Art. Intell.: Knowledge Discovery in Databases. American Association for Artificial Intelligence, Anaheim, CA, July 1991, pp. 25-34.
[22] P. Langley, "A general theory of discrimination learning," inProduction System Models of Learning and Development, D. Klahr, P. Langley, and R. Neches. eds. Cambridge, MA: MIT, 1987, pp. 99- 161.
[23] D. B. Lenat, "On automatic scientific theory formation: A case study using the AM program," inMach. Intell. Vol. 9. New York: Halsted, 1977, pp. 251-286.
[24] H. Mannila and K.-J. Raiha, "Dependency inference," inProc. 13th Int. Conf. Very Large Data Bases, Brighton, England, 1987, pp. 155- 158.
[25] R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, Eds.,Machine Learning: An Artificial Intelligence Approach, vol. 2. Los Altos, CA: Morgan Kaufmann, 1986.
[26] John K. Ousterhout, "TCL: An embeddable command language," inProc. 1990 Winter USENIX Conference. Washington, D.C., pp. 133- 146, 1990.
[27] J. Pearl and T. S. Verma, "A theory of inferred causation," inProc. 2nd Int. Conf. Principles of Knowledge Representation and Reasoning. San Mateo, CA: Kaufmann, 1991, pp. 441-452.
[28] J. Pearl,Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann, 1988.
[29] G. Piatetsky-Shapiro and W. Frawley,Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press/MIT Press, 1991.
[30] G. Piatetsky-Shapiro and C. J. Matheus, "Knowledge Discovery Workbench: An exploratory environment for discovery in business databases," inWorkshop Notes from the 9th National Conference on Artificial Intelligence: Knowledge Discovery in Databases, Anaheim, CA, July 1991, pp. 11-24.
[31] G. Piatetsky-Shapiro, "Discovery, analysis, and presentation of strong rules," inKnowledge Discovery in Databases. Cambridge, MA: AAAI/MIT, 1991, pp. 229-248.
[32] G. Piatetsky-Shapiro, ed.,Workshop Notes from the 9th Nat. Conf. Art. Intell.: Knowledge Discovery in Databases, Anaheim, CA. July 1991.
[33] G. Piatetsky-Shapiro, "Probabilistic data dependencies," inProc. Mach. Discovery Work, (9th Mach. Learn. Conf.), Aberdeen, Scotland, 1992, pp. 11-17.
[34] G. Piatetsky-Shapiro, ed. Special issue on: "Knowledge Discovery in Data and Knowledge Bases,"Int. J. Intell. Syst., vol. 7, no. 7, 1992.
[35] J. R. Quinlan, "Induction of decision trees,"Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[36] J. Ross Quinlan, "Learning relations: Comparison of a symbolic and a connectionist approach," Tech. Rep. TR-346, Basser Department of Computer Science, University of Sydney, Australia, May 1989.
[37] J. R. Quinlan, "Unknown attribute values in induction," inProceedings of the Sixth International Machine Learning Workshop, A. M. Segre, ed. San Mateo, CA: Kaufmann, 1989, pp. 164-168.
[38] S. F. Roth and J. Mattis, "Automating the presentation of information," inIEEE Conf. Art. Intell. Appl., Miami Beach, FL, 1991.
[39] D.E. Rumelhart and D. McClelland, eds.,Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vols. 1-2, MIT Press, Cambridge, Mass., 1986.
[40] R. Scheines and P. Spirtes, "Finding latent variable models in large data bases,"Int. J. Intell. Syst., 1992. vol. 7, no. 7, Sept. 1992, pp. 609-622.
[41] J. Schlimmer, "Learning determinations and checking databases," inProc. Knowledge Discovery in Databases, 1991, pp. 64-76.
[42] J. Schmitz, G. Armstrong, and J. D. C. Little, "CoverStory-automated news finding in marketing," inDSS Transactions. Institute of Management Sciences, Providence, RI, 1990.
[43] S. Shekhar, B. Hamidzadeh, A. Kohli, and M. Coyle, "Learning transformation rules for semantic query optimization: A data-driven approach,"IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[44] S. Smith et al., "Stereophonic and Surface Sound Generation for Exploratory Data Analysis,"CHI Proceedings, ACM SIGCHI, Apr., 1990, pp. 125-132.
[45] M. Stonebraker, "Triggers and inference in data base systems," inProc. Islamoora Conf. Expert Data Bases, Islamorada, 1985.
[46] E. R. Tufte,The Visual display of Quantitative Information. Cheshire, CT: Graphics Press, 1983.
[47] J. D. Ullman,Principles of Databases Systems. Rockville, MD: Computer Science Press, 1982.
[48] J. M. Zytkow and J. Baker, "interactive mining of regularities in databases," inKnowledge Discovery in Databases. Cambridge, MA: AAAI/MIT, 1991.

Index Terms:
knowledge discovery; real-world databases; idealized knowledge-discovery system; CoverStory; EXPLORA; Knowledge Discovery Workbench; future research; KDD systems; machine learning; knowledge acquisition; deductive databases; knowledge acquisition; knowledge based systems; learning (artificial intelligence)
Citation:
C.J. Matheus, P.K. Chan, G. Piatetsky-Shapiro, "Systems for Knowledge Discovery in Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 5, no. 6, pp. 903-913, Dec. 1993, doi:10.1109/69.250073
Usage of this product signifies your acceptance of the Terms of Use.