The Community for Technology Leaders
Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on (2011)
Lyon, France
Aug. 22, 2011 to Aug. 27, 2011
ISBN: 978-0-7695-4513-4
pp: 112-119
Clustering of search engine queries has attracted significant attention in recent years. Many search engine applications such as query recommendation require query clustering as a pre-requisite to function properly. Indeed, clustering is necessary to unlock the true value of query logs. However, clustering search queries effectively is quite challenging, due to the high diversity and arbitrary input by users. Search queries are usually short and ambiguous in terms of user requirements. Many different queries may refer to a single concept, while a single query may cover many concepts. Existing prevalent clustering methods, such as K-Means or DBSCAN cannot assure good results in such a diverse environment. Agglomerative clustering gives good results but is computationally quite expensive. This paper presents a novel clustering approach based on a key insight -- search engine results might themselves be used to identify query similarity. We propose a novel similarity metric for diverse queries based on the ranked URL results returned by a search engine for queries. This is used to develop a very efficient and accurate algorithm for clustering queries. Our experimental results demonstrate more accurate clustering performance, better scalability and robustness of our approach against known baselines.
Search egine query clustering, Top-k search results, Clustering validation

H. Lu, J. Vaidya and Y. Hong, "Search Engine Query Clustering Using Top-k Search Results," 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies(WI-IAT), Lyon, 2011, pp. 112-119.
81 ms
(Ver 3.3 (11022016))