This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2007 IEEE 23rd International Conference on Data Engineering
Organizing Hidden-Web Databases by Clustering Visible Web Documents
Istanbul, Turkey
April 15-April 20
ISBN: 1-4244-0802-4
Luciano Barbosa, University of Utah, lbarbosa@cs.utah.edu
Juliana Freire, University of Utah, juliana@cs.utah.edu
Altigran Silva, Universidade Federal do Amazonas, alti@dcc.ufam.edu.br
In this paper we address the problem of organizing hidden-Web databases. Given a heterogeneous set of Web forms that serve as entry points to hidden-Web databases, our goal is to cluster the forms according to the database domains to which they belong. We propose a new clustering approach that models Web forms as a set of hyperlinked objects and considers visible information in the form context both within and in the neighborhood of formsas the basis for similarity comparison. Since the clustering is performed over features that can be automatically extracted, the process is scalable. In addition, because it uses a rich set of metadata, our approach is able to handle a wide range of forms, including content-rich forms that contain multiple attributes, as well as simple keyword-based search interfaces. An experimental evaluation over real Web data shows that our strategy generates high-quality clustersmeasured both in terms of entropy and F-measure. This indicates that our approach provides an effective and general solution to the problem of organizing hidden-Web databases.
Citation:
Luciano Barbosa, Juliana Freire, Altigran Silva, "Organizing Hidden-Web Databases by Clustering Visible Web Documents," icde, pp.326-335, 2007 IEEE 23rd International Conference on Data Engineering, 2007
Usage of this product signifies your acceptance of the Terms of Use.