Subscribe

Issue No.02 - February (2012 vol.24)

pp: 295-308

Ke Deng , The University of Queensland, Brisbane

Shazia Sadiq , The University of Queensland, Brisbane

Xiaofang Zhou , The University of Queensland, Brisbane

Hu Xu , Huazhong University of Science and Technology, Wuhan

Gabriel Pui Cheong Fung , Arizona State University, Tempe

Yansheng Lu , Huazhong University of Science and Technology, Wuhan

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.230

ABSTRACT

Given a data point set D, a query point set Q, and an integer k, the Group Nearest Group (GNG) query finds a subset \omega (\vert \omega \vert \le k) of points from D such that the total distance from all points in Q to the nearest point in \omega is not greater than any other subset \omega^{\prime } (\vert \omega^{\prime }\vert \le k) of points in D. GNG query is a partition-based clustering problem which can be found in many real applications and is NP-hard. In this paper, Exhaustive Hierarchical Combination (EHC) algorithm and Subset Hierarchial Refinement (SHR) algorithm are developed for GNG query processing. While EHC is capable to provide the optimal solution for k=2, SHR is an efficient approximate approach that combines database techniques with local search heuristic. The processing focus of our approaches is on minimizing the access and evaluation of subsets of cardinality k in D since the number of such subsets is exponentially greater than \vert D\vert. To do that, the hierarchical blocks of data points at high level are used to find an intermediate solution and then refined by following the guided search direction at low level so as to prune irrelevant subsets. The comprehensive experiments on both real and synthetic data sets demonstrate the superiority of SHR in terms of efficiency and quality.

INDEX TERMS

K-median clustering, group nearest group query, group nearest neighbor query.

CITATION

Ke Deng, Shazia Sadiq, Xiaofang Zhou, Hu Xu, Gabriel Pui Cheong Fung, Yansheng Lu, "On Group Nearest Group Query Processing",

*IEEE Transactions on Knowledge & Data Engineering*, vol.24, no. 2, pp. 295-308, February 2012, doi:10.1109/TKDE.2010.230REFERENCES