Issue No. 02 - February (2012 vol. 24)

ISSN: 1041-4347

pp: 295-308

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.230

Yansheng Lu , Huazhong University of Science and Technology, Wuhan

Xiaofang Zhou , The University of Queensland, Brisbane

Gabriel Pui Cheong Fung , Arizona State University, Tempe

Hu Xu , Huazhong University of Science and Technology, Wuhan

Shazia Sadiq , The University of Queensland, Brisbane

Ke Deng , The University of Queensland, Brisbane

ABSTRACT

Given a data point set D, a query point set Q, and an integer k, the Group Nearest Group (GNG) query finds a subset \omega (\vert \omega \vert \le k) of points from D such that the total distance from all points in Q to the nearest point in \omega is not greater than any other subset \omega^{\prime } (\vert \omega^{\prime }\vert \le k) of points in D. GNG query is a partition-based clustering problem which can be found in many real applications and is NP-hard. In this paper, Exhaustive Hierarchical Combination (EHC) algorithm and Subset Hierarchial Refinement (SHR) algorithm are developed for GNG query processing. While EHC is capable to provide the optimal solution for k=2, SHR is an efficient approximate approach that combines database techniques with local search heuristic. The processing focus of our approaches is on minimizing the access and evaluation of subsets of cardinality k in D since the number of such subsets is exponentially greater than \vert D\vert. To do that, the hierarchical blocks of data points at high level are used to find an intermediate solution and then refined by following the guided search direction at low level so as to prune irrelevant subsets. The comprehensive experiments on both real and synthetic data sets demonstrate the superiority of SHR in terms of efficiency and quality.

INDEX TERMS

K-median clustering, group nearest group query, group nearest neighbor query.

CITATION

Yansheng Lu, Xiaofang Zhou, Gabriel Pui Cheong Fung, Hu Xu, Shazia Sadiq, Ke Deng, "On Group Nearest Group Query Processing",

*IEEE Transactions on Knowledge & Data Engineering*, vol. 24, no. , pp. 295-308, February 2012, doi:10.1109/TKDE.2010.230SEARCH