loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
12th IEEE International Workshop on Program Comprehension (IWPC'04)
An Empirical Study on Keyword-based Web Site Clustering
Bari, Italy
June 24-June 26
ISBN: 0-7695-2149-5
Filippo Ricca, ITC-irst, Italy
Paolo Tonella, ITC-irst, Italy
Christian Girardi, ITC-irst, Italy
Emanuele Pianta, ITC-irst, Italy
Web site evolution is characterized by a limited support to the understanding activities offered to the developers. In fact, design diagrams are often missing or outdated. A potentially interesting option is to reverse engineer high level views of Web sites from the content of the Web pages. Clustering is a valuable technique that can be used in this respect. Web pages can be clustered together based on the similarity of summary information about their content, represented as a list of automatically extracted keywords.
This paper presents an empirical study that was conducted to determine the meaningfulness for Web developers of clusters automatically produced from the analysis of the Web page content. Natural Language Processing (NLP) plays a central role in content analysis and keyword extraction. Thus, a second objective of the study was to assess the contribution of some shallow NLP techniques to the clustering task.
Citation:
Filippo Ricca, Paolo Tonella, Christian Girardi, Emanuele Pianta, "An Empirical Study on Keyword-based Web Site Clustering," icpc, pp.204, 12th IEEE International Workshop on Program Comprehension (IWPC'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.