This Article 
 Bibliographic References 
 Add to: 
Query Relaxation by Structure and Semantics for Retrieval of Logical Web Documents
July/August 2002 (vol. 14 no. 4)
pp. 768-791

Since the Web encourages hypertext and hypermedia document authoring (e.g., HTML or XML), Web authors tend to create documents that are composed of multiple pages connected with hyperlinks. A Web document may be authored in multiple ways, such as, 1) all information in one physical page, or 2) a main page and the related information in separate linked pages. Existing Web search engines, however, return only physical pages containing keywords. In this paper, we introduce the concept of information unit, which can be viewed as a logical Web document consisting of multiple physical pages as one atomic retrieval unit. We present an algorithm to efficiently retrieve information units. Our algorithm can perform progressive query processing. These functionalities are essential for information retrieval on the Web and large XML databases. We also present experimental results on synthetic graphs and real Web data.

[1] eGroups,
[2] W.-S. Li, J. Shim, K.S. Candan, and Y. Hara, “WebDB: A System for Querying Semistructured Data on the Web,” Technical Report 98-C027-4-5096-4R, C&C Research Laboratories, NEC USA Inc., Princeton, N.J. Apr. 1998.
[3] D. Gibson, J. Kleinberg, and P. Raghavan, "Inferring Web Communities from Link Topology," Proc. 9th ACM Conf. Hypertext and Hypermedia, ACM Press, New York, 1998, pp. 225-234.
[4] K. Bharat and M.R. Henzinger, "Improved Algorithms for Topic Distillation in a Hyperlinked Environment," Proc. 21st ACM-SIGIR Conf., ACM Press, 1998, pp. 104-111.
[5] The Steiner Tree Problem–Annals of Discrete Mathematics. vol. 53, F.K. Hwang, D.S. Richards, and P. Winter, eds. 1992.
[6] S.L. Hakimi, “Steiner's Problem in Graphs and Its Implications,” Network, pp. 113-131, 1971.
[7] G. Reich and P. Widmayer, “Approximate Minimum Spanning Trees for Vertex Classes,” technical report, Inst. fur Informatik, Freiburg Univ., 1991.
[8] N. Garg, G. Konjevod, and R. Ravi, “A Polylogarithmic Approximation Algorithm for the Group Steiner Tree Problem,” Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 253-259, 1998.
[9] E. Ihler, “Bounds on the Quality of Approximate Solutions to the Group Steiner Tree Problem,” Proc. 16th Int'l Workshop Graph Theoretic Concepts in Computer Science, pp. 109-118, 1991.
[10] G. Reich and P. Widmayer, “Beyond Steiner's Problem: A VLSI Oriented Generalization,” Proc. 15th Int'l Workshop Graph Theoretic Concepts in Computer Science WG89, pp. 196-210, 1990.
[11] P. Slavik, “The Errand Scheduling Problem,” Technical Report, CSE 97-02, State of Univ. of New York, at Buffalo. 1997.
[12] C.D. Bateman, C.S. Helvig, G. Robins, and A. Zelikovsky, “Provably Good Routing Tree Construction with Multi-Port Terminals,” Proc. ACM/SIGDA Int'l Symp. Physical Design, pp. 96-102, Apr. 1997.
[13] J. Cho, H. Garcia-Molina, and L. Page, "Efficient Crawling through URL Ordering," Proc. 7th WWW Conf., 1998; .
[14] R. Richardson, A. Smeaton, and J. Murphy, “Using Wordnet as a Knowledge Base for Measuring Conceptual Similarity between Words,” Proc. Artificial Intelligence and Cognitive Science Conf., 1994.
[15] B. Croft, R. Cook, and D. Wilder, “Providing Government Information on the Internet: Experiences with THOMAS,” “Proc. Digital Libraries (DL '95),” 1995.
[16] W.-S. Li, Q. Vu, D. Agrawal, Y. Hara, and H. Takano, “PowerBookmarks: A System for Personalizable Web Information Organization, Sharing, and Management,” Proc. Eighth World-Wide Web Conf., pp. 1375-1389, May 1999.
[17] Wired Digital Inc., Information available at, com1919.htmhttp:/
[18] R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Gracia-Molina, “Proximity Search in Databases,” Proc. 24th VLDB Conf., pp. 26-37, 1998.
[19] S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Proc. 7th WWW Conf., 1998; .
[20] D. Gibson, J. Kleinberg, and P. Raghavan, “Clustering Categorical Data: An Approach Based on Dynamical Systems,” Proc. Very Large Data Base Conf., 1998.
[21] S. Mukherjea and Y. Hara, “Focus+Context Views of World-Wide Web Nodes,” Proc. ACM Hypertext '97 Conf., pp. 187-196, Mar. 1997.
[22] K. Tajima, Y. Mizuuchi, M. Kitagawa, and K. Tanaka, “Cut as a Querying Unit for WWW, Netnews, E-mail,” Proc. 1998 ACM Hypertext Conf., pp. 235-244, June 1998.
[23] J. Kleinberg, "Authoritative Sources in a Hyperlinked Environment," Proc. 9th ACM-SIAM Symp. Discrete Algorithms, ACM Press, 1998, pp. 668-677.
[24] S. Chakrabarti et al., "Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text," Proc. 7th World Wide Web Conf., Elsevier Science, Amsterdam, 1998, pp. 65-74.
[25] S. Dar, G. Entin, S. Geva, and E. Palmon, “DTL's DataSpot: Database Exploration Using Plain Language,” Proc. 24th Int'l Conf. Very Large Data Bases, pp. 645-649, Aug. 1998.
[26] D. Suciu, “Semistructured Data and XML,” Proc. Fifth Int'l Conf. Foundations of Data Organization (FODO '98), pp. 1-12, Nov. 1998.

Index Terms:
Web proximity search, link structures, query relaxation, progressive processing.
Wen-Syan Li, K. Selçuk Candan, Quoc Vu, Divyakant Agrawal, "Query Relaxation by Structure and Semantics for Retrieval of Logical Web Documents," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 4, pp. 768-791, July-Aug. 2002, doi:10.1109/TKDE.2002.1019213
Usage of this product signifies your acceptance of the Terms of Use.