The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - July/August (2002 vol.14)
pp: 768-791
ABSTRACT
<p>Since the Web encourages hypertext and hypermedia document authoring (e.g., HTML or XML), Web authors tend to create <it>documents</it> that are composed of multiple pages connected with hyperlinks. A Web document may be authored in multiple ways, such as, 1) all information in one physical page, or 2) a main page and the related information in separate linked pages. Existing Web search engines, however, return only <it>physical pages</it> containing keywords. In this paper, we introduce the concept of <it>information unit</it>, which can be viewed as a <it>logical</it> Web document consisting of multiple <it>physical</it> pages as one atomic retrieval unit. We present an algorithm to efficiently retrieve information units. Our algorithm can perform progressive query processing. These functionalities are essential for information retrieval on the Web and large XML databases. We also present experimental results on synthetic graphs and real Web data.</p>
INDEX TERMS
Web proximity search, link structures, query relaxation, progressive processing.
CITATION
Wen-Syan Li, K. Selçuk Candan, Quoc Vu, Divyakant Agrawal, "Query Relaxation by Structure and Semantics for Retrieval of Logical Web Documents", IEEE Transactions on Knowledge & Data Engineering, vol.14, no. 4, pp. 768-791, July/August 2002, doi:10.1109/TKDE.2002.1019213
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool