This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Scalable Discovery of Informative Structural Concepts Using Domain Knowledge
October 1996 (vol. 11 no. 5)
pp. 59-68

The Subdue system evaluates the benefits of using domain knowledge to guide the discovery of repetitive, functional substructures in large structural databases. Results show that domain-specific knowledge improves the search for such substructures and enables greater data compression.

The increasing amount and complexity of today's data creates an urgent need to accelerate discovery of knowledge in large databases. In response, designers have developed numerous approaches for discovering concepts in databases using a linear, attribute-value representation. These approaches address issues of data relevance, missing data, noise, and domain knowledge. However, much of the data collected is structural in nature or composed of parts and relations between the parts. Hence, there is a need for scalable tools to analyze and discover concepts in structural databases. Many reported discovery tools are also computationally expensive and cannot scale easily to large databases, especially those containing structural information.

Recently, we introduced a method for discovering substructures in structural databases using the minimum description length (MDL) principle. The system, called Subdue, discovers substructures that compress the input database and represent structural concepts. Once Subdue discovers a substructure, the system simplifies the data by replacing instances of the substructure with a pointer to the substructure definition. The discovered substructures allow abstraction over detailed structures in the original data. Iteration of the substructure discovery and replacement process constructs a hierarchical description of the structural data in terms of the discovered substructures. This hierarchy provides varying levels of interpretation that users can access based on the specific goals of the data analysis.

In this article, we focus on how to realize the benefits of domain-dependent discovery approaches by adding domain-specific knowledge to a domain-independent discovery system. We also evaluate the benefits and costs of using domain-specific information. In particular, we measure the performance of the Subdue system with and without domain-specific knowledge along the performance dimensions of compression, the time needed to discover the substructures, and the usefulness of the discovered substructures. Finally, we address the issue of scalability of structure discovery using Subdue. On the basis of scalability tests we've conducted, we highlight features of databases that can affect Subdue's performance.

Citation:
Diane J. Cook, Lawrence B. Holder, Surnjani Djoko, "Scalable Discovery of Informative Structural Concepts Using Domain Knowledge," IEEE Intelligent Systems, vol. 11, no. 5, pp. 59-68, Oct. 1996, doi:10.1109/64.539018
Usage of this product signifies your acceptance of the Terms of Use.