This Article 
 Bibliographic References 
 Add to: 
Keyword Proximity Search in XML Trees
April 2006 (vol. 18 no. 4)
pp. 525-539
Recent works have shown the benefits of keyword proximity search in querying XML documents in addition to text documents. For example, given query keywords over Shakespeare's plays in XML, the user might be interested in knowing how the keywords cooccur. In this paper, we focus on XML trees and define XML keyword proximity queries to return the (possibly heterogeneous) set of minimum connecting trees (MCTs) of the matches to the individual keywords in the query. We consider efficiently executing keyword proximity queries on labeled trees (XML) in various settings: 1) when the XML database has been preprocessed and 2) when no indices are available on the XML database. We perform a detailed experimental evaluation to study the benefits of our approach and show that our algorithms considerably outperform prior algorithms and other applicable approaches.

[1] DBLP computer science bibliography, http:/, 2006.
[2] The XML Benchmark Project, http:/, 2006.
[3] S. Abiteboul, P. Buneman, and D. Suciu, Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, 1999.
[4] S. Agrawal, S. Chaudhuri, and G. Das, “DBXplorer: A System for Keyword-Based Search over Relational Databases,” Proc. Int'l Conf. Data Eng., 2002.
[5] S. Al-Khalifa, H.V. Jagadish, N. Koudas, J.M. Patel, D. Srivastava, and Y. Wu, “Structural Joins: A Primitive for Efficient XML Query Pattern Matching,” Proc. Int'l Conf. Data Eng., 2002.
[6] M. Bender and M.F. Colton, “The LCA Problem Revisited,” Latin Am. Theoretical Informatics, 2000.
[7] G. Bhalotia, C. Nakhey, A. Hulgeri, S. Chakrabarti, and S. Sudarshan, “Keyword Searching and Browsing in Databases Using BANKS,” Proc. Int'l Conf. Data Eng., 2002.
[8] S. Boag, D. Chamberlin, M.F. Fernandez, D. Florescu, J. Robie, and J. Simeon, “XQuery 1.0: An XML Query Language,” W3C Working Draft,, 2006.
[9] S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Proc. World Wide Web Conf., 1998.
[10] N. Bruno, N. Koudas, and D. Srivastava, “Holistic Twig Joins: Optimal XML Pattern Matching,” Proc. ACM SIGMOD Conf., 2002.
[11] J. Clark and S. DeRose, “XML Path Language XPath 1.0,” W3C Recommendation,, 2006.
[12] S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv, “XSEarch: A Semantic Search Engine for XML,” Proc. Conf. Very Large Databases, 2003.
[13] R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina, “Proximity Search in Databases,” Proc. Conf. Very Large Databases, 1998.
[14] L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram, “XRANK: Ranked Keyword Search over XML Documents,” Proc. ACM SIGMOD Conf., 2003.
[15] D. Harel and R.E. Tarjan, “Fast Algorithms for Finding Nearest Common Ancestors,” SIAM J. Computing, vol. 13, no. 2, pp. 338-355, 1984.
[16] V. Hristidis and Y. Papakonstantinou, “DISCOVER: Keyword Search in Relational Databases,” Proc. Conf. Very Large Databases, 2002.
[17] V. Hristidis, Y. Papakonstantinou, and A. Balmin, “Keyword Proximity Search on XML Graphs,” Proc. Int'l Conf. Data Eng., 2003.
[18] Y. Li, C. Yu, and H.V. Jagadish, “Schema-Free XQuery,” Proc. Conf. Very Large Databases, 2004.
[19] A. Schmidt, M. Kersten, and M. Windhouwer, “Querying XML Documents Made Easy: Nearest Concept Queries,” Proc. Int'l Conf. Data Eng., 2001.
[20] Y. Xu and Y. Papakonstantinou, “Efficient Keyword Search for Smallest LCAs in XML Databases,” Proc. ACM SIGMOD Conf., 2005.
[21] C. Zhang, J. Naughton, D. Dewitt, Q. Luo, and G. Lohman, “On Supporting Containment Queries in Relational Database Management Systems,” Proc. ACM SIGMOD Conf., 2001.

Index Terms:
Lowest common ancestor, tree proximity search, XML keyword search.
Vagelis Hristidis, Nick Koudas, Yannis Papakonstantinou, Divesh Srivastava, "Keyword Proximity Search in XML Trees," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 4, pp. 525-539, April 2006, doi:10.1109/TKDE.2006.61
Usage of this product signifies your acceptance of the Terms of Use.