The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2012 vol.24)
pp: 295-308
Ke Deng , The University of Queensland, Brisbane
Shazia Sadiq , The University of Queensland, Brisbane
Xiaofang Zhou , The University of Queensland, Brisbane
Hu Xu , Huazhong University of Science and Technology, Wuhan
Gabriel Pui Cheong Fung , Arizona State University, Tempe
Yansheng Lu , Huazhong University of Science and Technology, Wuhan
ABSTRACT
Given a data point set D, a query point set Q, and an integer k, the Group Nearest Group (GNG) query finds a subset \omega (\vert \omega \vert \le k) of points from D such that the total distance from all points in Q to the nearest point in \omega is not greater than any other subset \omega^{\prime } (\vert \omega^{\prime }\vert \le k) of points in D. GNG query is a partition-based clustering problem which can be found in many real applications and is NP-hard. In this paper, Exhaustive Hierarchical Combination (EHC) algorithm and Subset Hierarchial Refinement (SHR) algorithm are developed for GNG query processing. While EHC is capable to provide the optimal solution for k=2, SHR is an efficient approximate approach that combines database techniques with local search heuristic. The processing focus of our approaches is on minimizing the access and evaluation of subsets of cardinality k in D since the number of such subsets is exponentially greater than \vert D\vert. To do that, the hierarchical blocks of data points at high level are used to find an intermediate solution and then refined by following the guided search direction at low level so as to prune irrelevant subsets. The comprehensive experiments on both real and synthetic data sets demonstrate the superiority of SHR in terms of efficiency and quality.
INDEX TERMS
K-median clustering, group nearest group query, group nearest neighbor query.
CITATION
Ke Deng, Shazia Sadiq, Xiaofang Zhou, Hu Xu, Gabriel Pui Cheong Fung, Yansheng Lu, "On Group Nearest Group Query Processing", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 2, pp. 295-308, February 2012, doi:10.1109/TKDE.2010.230
REFERENCES
[1] M. Yiu, N. Manoulis, and D. Papadias, "Aggregate Nearest Neighbor Queries in Road Networks," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 6, pp. 820-833, June 2005.
[2] K. Deng, X. Zhou, and H. Shen, "Multi-Source Skyline Query Processing in Road Networks," Proc. 23th IEEE Int'l Conf. Data Eng., 2007.
[3] M. Sharifzadeh and C. Shahabi, "The Spatial Skyline Queries," Proc. 32nd Very Large Data Bases Conf., 2006.
[4] D. Papadias, Q. Shen, Y. Tao, and K. Mouratids, "Group Nearest Neighbor Queries," Proc. 20th IEEE Int'l Conf. Data Eng., 2004.
[5] D. Papadias, Y. Tao, K. Mourstidis, and C.K. Hui, "Aggregate Nearest Neighbor Queries in Spatial Databases," ACM Trans. Database Systems, vol. 30, no. 2, pp. 529-576, 2005.
[6] P. Hansen, Systems of Cities and Facility Location. Harwood Academic Publishers GmbH, 1987.
[7] M. Garey and D. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman and Company, 1979.
[8] Local Search in Combinatorial Optimization, E. Aarts and J.K. Lenstra, eds. Princeton Univ. Press, 2003.
[9] K.E. Rosing, "An Empirical Investigation of the Effectiveness of a Vertex Substitution Heuristic," Environment and Planning B, vol. 24, pp. 59-67, 1997.
[10] R. Whitaker, "A Fast Algorithm for the Greedy Interchange of Large-Scale Clustering and Median Location Problems," INFOR, vol. 21, pp. 95-108, 1983.
[11] V. Arya, N. Gary, R. Khandekar, A. Mayerson, K. Munagala, and V. Pandit, "Local Search Heuristics for k-Median and Facility Location Problems," Proc. 33rd ACM Symp. Theory of Computing, 2001.
[12] L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.
[13] R. Ng and J. Han, "Efficient and Effective Clustering Method for Spatial Data Mining," Proc. 20th Very Large Data Bases Conf., 1994.
[14] K. Mouratidis, D. Papadias, and S. Papadimitriou, "Tree-Based Partition Querying: A Methodology for Computing Medoids in Large Spatial Datasets," The VLDB J., vol. 17, no. 4, pp. 923-945, 2008.
[15] K. Deng, H. Xu, S. Sadiq, Y. Lu, G. Fung, and H. Shen, "Processing Group Nearest Group Query," Proc. 25th IEEE Int'l Conf. Data Eng., 2009.
[16] G. Hjaltason and H. Samet, "Distance Browsing in Spatial Databases," ACM Trans. Database Systems, vol. 24, no. 2, pp. 265-318, 1999.
[17] C. Bohm, S. Berchtold, and D. Keim, "Searching in High Dimensional Spaces: Index Structures for Improving the Performance of Multimedia Databases," ACM Computing Surveys, vol. 33, no. 3, pp. 322-373, 2001.
[18] K. Cheung and A.W.C. Fu, "Enhanced Nearest Neighbor Search on the R-Tree," ACM SIGMOD Record, vol. 27, no. 3, pp. 16-21, 1998.
[19] I. Dhillon and D. Modha, "A Data-Clustering Algorithm on Distributed Memory Multiprocessors," Proc. Large-Scale Parallel Data Mining, pp. 245-260, 1999.
[20] J. Macqueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statistics and Probability, pp. 281-297, 1967.
[21] J. Han, M. Kamber, and A. Tung, "Spatial Clustering Methods in Data Mining: A Survey," Geographic Data Mining and Knowledge Discovery, pp. 1-29, 2001.
[22] M. Ester, H.P. Kriegel, and X. Xu, "A Database Interface for Clustering in Large Spatial Databases," Proc. First ACM SIGKDD Conf. Knowledge Discovery and Data Mining, 1995.
[23] D. Zhang, Y. Du, T. Xia, and Y. Tao, "Progressive Communication of the Min-Dist Optimal-Location Query," Proc. 32nd Very Large Data Bases Conf., 2006.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool