Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)
Understanding theWeb Page Layout
Hong Kong, China
December 18-December 22
ISBN: 0-7695-2702-7
Web pages express their semantics not only by free texts, but also by their layouts. While information is explicitly encoded in free texts, the layout implicitly uncovers the semantical relationships of the free texts. In this paper, we proposed a framework for mining the semantics implied by the layout. The core of our work is a new HTML document model, called nested table model, which synthesize the DOM model and the syntax of HTML language. By the nested table model, we could formally define the relevancy of free texts. And hence, free texts could be grouped by their relevancy. Our experiment results indicate that the relevancy correctly reflects the semantics of web page layout.
Citation:
Minghong Zhou, Rubao Li, Wei Li, "Understanding theWeb Page Layout," icdmw, pp.438-442, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), 2006
Usage of this product signifies your acceptance of the
Terms of Use.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||