This Article 
 Bibliographic References 
 Add to: 
An Empirical Study of Domain Knowledge and Its Benefits to Substructure Discovery
July-August 1997 (vol. 9 no. 4)
pp. 575-586

Abstract—Discovering repetitive, interesting, and functional substructures in a structural database improves the ability to interpret and compress the data. However, scientists working with a database in their area of expertise often search for predetermined types of structures or for structures exhibiting characteristics specific to the domain. This paper presents a method for guiding the discovery process with domain-specific knowledge. In this paper, the SUBDUE discovery system is used to evaluate the benefits of using domain knowledge to guide the discovery process. Domain knowledge is incorporated into SUBDUE following a single general methodology to guide the discovery process. Results show that domain-specific knowledge improves the search for substructures that are useful to the domain and leads to greater compression of the data. To illustrate these benefits, examples and experiments from the computer programming, computer-aided design circuit, and artificially generated domains are presented.

[1] P. Cheeseman, J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Freeman, "Autoclass: A Bayesian Classification System," Proc. Fifth Int'l Workshop Machine Learning, pp. 54-64, 1988.
[2] D.H. Fisher, “Knowledge Acquisition via Incremental Conceptual Clustering,” Machine Learning, no. 2, pp. 139-172, 1987.
[3] Knowledge Discovery in Databases, W.J. Frawley, G. Piatetsky-Shapiro, and C.J. Matheus, eds., AAAI Press/MIT Press, 1991.
[4] J.R. Quinlan,"Induction of decision trees," Machine Learning, vol. 1, pp. 81-106, 1986.
[5] J. Quinlan and R. Rivest, "Inferring Decision Trees Using Minimum Description Length Principle," Information and Computation, vol. 80, pp. 227-248, 1989.
[6] D.J. Cook, L.B. Holder, and S. Djoko, "Knowledge Discovery from Structural Data," J. Intelligent Information Systems, vol. 5, no. 3, pp. 229-245, 1995.
[7] D. Conklin, S. Fortier, J. Glasgow, and F. Allen, "Discovery of Spatial Concepts in Crystallographic Databases," Proc. Ninth Int'l Machine Learning Workshop, pp. 111-116, 1992.
[8] R. Levinson, "A Self-Organizing Retrieval System for Graphs," Proc. Second Nat'l Conf. Artificial Intelligence, pp. 203-206, 1984.
[9] J. Segen, "Learning Graph Models of Shape," Proc. Fifth Int'l Conf. Machine Learning, pp. 29-35, 1988.
[10] K. Thompson and P. Langley, "Concept Formulation in Structured Domains," in Concept Formulation: Knowledge and Experience in Unsupervised Learning, D.H. Fisher and M. Pazzani, eds., Morgan Kaufmann, San Francisco, 1991.
[11] P.H. Winston, "Learning Structural Descriptions from Examples," The Psychology of Computer Vision, P.H. Winston, ed., pp. 157-210. McGraw-Hill, 1975.
[12] A.K.C. Wong and M. You, "Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 7, no. 5, pp. 599-609, 1985.
[13] H. Bunke and G. Allermann, "Inexact Graph Matching for Structural Pattern Recognition," Pattern Recognition Letters, vol. 1, no. 4, pp. 245-253, 1983.
[14] A. Sanfeliu and K.S. Fu, "A Distance Measure between Attributed Relational Graphs for Pattern Recognition," IEEE Trans. Systems, Man, and Cybernetics, vol. 13, pp. 353-362, 1983.
[15] K. Yoshida, H. Motoda, and N. Indurkhya, "Unifying Learning Methods by Colored Digraphs," Proc. Learning and Knowledge Acquisition Workshop at IJCAI-93, 1993.
[16] D.J. Cook and L.B. Holder, "Substructure Discovery Using Minimum Description Length and Background Knowledge," J. Artificial Intelligence Research, vol. 1, pp. 231-255, 1994.
[17] J. Rissanen, Stochastic Complexity in Statistical Inquiry. World Scientific Series in Computer Science, vol. 15, 1989.
[18] Y.G. Leclerc, "Constructing Simple Stable Descriptions for Image Partitioning," Int'l J. Computer Vision, vol. 3, no. 1, pp. 73-102, 1989.
[19] E.P.D. Pednault, "Some Experiments in Applying Inductive Inference Principles to Surface Reconstruction," Proc. Int'l Joint Conf. Artificial Intelligence, pp. 1,603-1,609, 1989.
[20] A. Pentland,“Part segmentation for object recognition,” Neural Computation, vol. 1, pp. 82-91, 1989.
[21] M. Derthick, "A Minimal Encoding Approach to Feature Discovery," Proc. Ninth Nat'l Conf. Artificial Intelligence, pp. 565-571, 1991.
[22] R.B. Rao and S.C. Lu, "Learning Engineering Models with the Minimum Description Length Principle," Proc. 10th Nat'l Conf. Artificial Intelligence, pp. 717-722, 1992.
[23] L.T. Bruton, RC-Active Circuits Theory and Design, Prentice Hall, 1980.

Index Terms:
Data mining, minimum description length principle, data compression, inexact graph match, domain knowledge.
Surnjani Djoko, Diane J. Cook, Lawrence B. Holder, "An Empirical Study of Domain Knowledge and Its Benefits to Substructure Discovery," IEEE Transactions on Knowledge and Data Engineering, vol. 9, no. 4, pp. 575-586, July-Aug. 1997, doi:10.1109/69.617051
Usage of this product signifies your acceptance of the Terms of Use.