loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)
Understanding theWeb Page Layout
Hong Kong, China
December 18-December 22
ISBN: 0-7695-2702-7
Minghong Zhou, Chinese Academy of Sciences
Rubao Li, Chinese Academy of Sciences
Wei Li, Chinese Academy of Sciences
Web pages express their semantics not only by free texts, but also by their layouts. While information is explicitly encoded in free texts, the layout implicitly uncovers the semantical relationships of the free texts. In this paper, we proposed a framework for mining the semantics implied by the layout. The core of our work is a new HTML document model, called nested table model, which synthesize the DOM model and the syntax of HTML language. By the nested table model, we could formally define the relevancy of free texts. And hence, free texts could be grouped by their relevancy. Our experiment results indicate that the relevancy correctly reflects the semantics of web page layout.
Citation:
Minghong Zhou, Rubao Li, Wei Li, "Understanding theWeb Page Layout," icdmw, pp.438-442, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.