This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Returning Clustered Results for Keyword Search on XML Documents
December 2011 (vol. 23 no. 12)
pp. 1811-1825
Xiping Liu, Jiangxi University of Finance and Economics, Nanchang
Changxuan Wan, Jiangxi University of Finance and Economics, Nanchang
Lei Chen, The Hong Kong University of Science and Technology, Hong Kong
Keyword search is an effective paradigm for information discovery and has been introduced recently to query XML documents. In this paper, we address the problem of returning clustered results for keyword search on XML documents. We first propose a novel semantics for answers to an XML keyword query. The core of the semantics is the conceptually related relationship between keyword matches, which is based on the conceptual relationship between nodes in XML trees. Then, we propose a new clustering methodology for XML search results, which clusters results according to the way they match the given query. Two approaches to implement the methodology are discussed. The first approach is a conventional one which does clustering after search results are retrieved; the second one clusters search results actively, which has characteristics of clustering on the fly. The generated clusters are then organized into a cluster hierarchy with different granularities to enable users locate the results of interest easily and precisely. Experimental results demonstrate the meaningfulness of the proposed semantics as well as the efficiency of the proposed methods.

[1] "DBLP Bibliography," www.informatik.uni-trier.de/~leydb/, 2011.
[2] http://www.cs.washington.edu/researchxmldatasets /, 2011.
[3] http://www.sigmod.org/publications/sigmod-record xml- edition, 2011.
[4] http:/www.xml-benchmark.org/, 2011.
[5] A.V. Aho, J.E. Hopcroft, and J.D. Ullman, "On Finding Lowest Common Ancestors in Trees," Proc. Fifth Ann. ACM Symp. Theory of Computing, 1973.
[6] S. Amer-Yahia, L.V.S. Lakshmanan, and S. Pandit, "FleXPath: Flexible Structure and Full-Text Querying for XML," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2004.
[7] Z. Bao, T.W. Ling, B. Chen, and J. Lu, "Effective XML Keyword Search with Relevance Oriented Ranking," Proc. 25th Int'l Conf. Data Eng., 2009.
[8] C. Carpineto, S. Osiński, G. Romano, and D. Weiss, "A Survey of Web Clustering Engines," ACM Computing Surveys, vol. 41, no. 3, pp. 1-38, 2009.
[9] L. Chen and Y. Papakonstantinou, "Supporting Top-K Keyword Search in XML Databases," Proc. 26th Int'l Conf. Data Eng., 2010.
[10] T. Chen, J. Lu, and T.W. Ling, "On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2005.
[11] J. Clark and S. DeRose, "XML Path Language (XPath) Version 1.0," W3C Recommendation, 1999.
[12] S. Cohen, Y. Kanza, B. Kimelfeld, and Y. Sagiv, "Interconnection Semantics for Keyword Search in XML," Proc. ACM Int'l Conf. Information and Knowledge Management (CIKM), 2005.
[13] S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv, "XSEarch: A Semantic Search Engine for XML," Proc. 29th Int'l Conf. Very Large Data Bases, 2003.
[14] S. Flesca, G. Manco, E. Masciari, L. Pontieri, and A. Pugliese, "Fast Detection of XML Structural Similarity," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 2, pp. 160-175, Feb. 2005.
[15] R. Goldman and J. Widom, "DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases," Proc. 23rd Int'l Conf. Very Large Data Bases, 1997.
[16] L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram, "XRANK: Ranked Keyword Search over XML Documents," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003.
[17] V. Hristidis, N. Koudas, Y. Papakonstantinou, and D. Srivastava, "Keyword Proximity Search in XML Trees," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 4, pp. 525-539, Apr. 2006.
[18] Y. Huang, Z. Liu, and Y. Chen, "Query Biased Snippet Generation in XML Search," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2008.
[19] K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, and R. Krishnapuram, "A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results," Proc. 13th Int'l Conf. World Wide Web, 2004.
[20] G. Li, J. Feng, J. Wang, and L. Zhou, "Effective Keyword Search for Valuable LCAs over XML Documents," Proc. 16th ACM Conf. Information and Knowledge Management, 2007.
[21] Y. Li, C. Yu, and H.V. Jagadish, "Schema-Free XQuery," Proc. 30th Int'l Conf. Very Large Data Bases, 2004.
[22] W. Lian, D.W.-L. Cheung, N. Mamoulis, and S.-M. Yiu, "An Efficient and Scalable Algorithm for Clustering XML Documents by Structure," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 1, pp. 82-96, Jan. 2004.
[23] Z. Liu and Y. Chen, "Identifying Meaningful Return Information for XML Keyword Search," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
[24] Z. Liu and Y. Chen, "Reasoning and Identifying Relevant Matches for XML Keyword Search," Proc. VLDB Endowment, vol. 1, no. 1, pp. 921-932, 2008.
[25] Z. Liu and Y. Chen, "Return Specification Inference and Result Clustering for Keyword Search on XML," ACM Trans. Database Systems, vol. 35, no. 2, pp. 1-47, 2010.
[26] Z. Liu, P. Sun, and Y. Chen, "Structured Search Result Differentiation," Proc. VLDB Endowment, vol. 2, no. 1, pp. 313-324, 2009.
[27] M. Necasky, "Conceptual Modeling for XML: A Survey," Technical Report No. 2006-3, Dept. of Software Eng., Faculty of Math. and Physics, Charles Univ., 2006, http://www.necasky. net/paperstr2006.pdf .
[28] A. Schmidt, M. Kersten, and M. Windhouwer, "Querying XML Documents Made Easy: Nearest Concept Queries," Proc. 17th Int'l Conf. Data Eng., 2001.
[29] C. Sun, C.-Y. Chan, and A.K. Goenka, "Multiway SLCA-Based Keyword Search in XML Data," Proc. 16th Int'l Conf. World Wide Web, 2007.
[30] Y. Xu and Y. Papakonstantinou, "Efficient Keyword Search for Smallest LCAs in XML Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2005.
[31] Y. Xu and Y. Papakonstantinou, "Efficient LCA Based Keyword Search in XML Data," Proc. 11th Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), 2008.

Index Terms:
XML keyword search, search results clustering, cluster hierarchy.
Citation:
Xiping Liu, Changxuan Wan, Lei Chen, "Returning Clustered Results for Keyword Search on XML Documents," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 12, pp. 1811-1825, Dec. 2011, doi:10.1109/TKDE.2011.183
Usage of this product signifies your acceptance of the Terms of Use.