Subscribe
Issue No.10 - October (2011 vol.23)
pp: 1526-1540
Feifei Li , Florida State University, Tallahassee
Bin Yao , Florida State University, Tallahassee
Piyush Kumar , Florida State University, Tallahassee
ABSTRACT
Given a set of points P and a query set Q, a group enclosing query (Geq) fetches the point p* âË?Ë? P such that the maximum distance of p* to all points in Q is minimized. This problem is equivalent to the Min-Max case (minimizing the maximum distance) of aggregate nearest neighbor queries for spatial databases [27]. This work first designs a new exact solution by exploring new geometric insights, such as the minimum enclosing ball, the convex hull, and the furthest voronoi diagram of the query group. To further reduce the query cost, especially when the dimensionality increases, we turn to approximation algorithms. Our main approximation algorithm has a worst case \sqrt{2}-approximation ratio if one can find the exact nearest neighbor of a point. In practice, its approximation ratio never exceeds 1.05 for a large number of data sets up to six dimensions. We also discuss how to extend it to higher dimensions (up to 74 in our experiment) and show that it still maintains a very good approximation quality (still close to 1) and low query cost. In fixed dimensions, we extend the \sqrt{2}-approximation algorithm to get a (1 + Ã?Âµ)-approximate solution for the Geq problem. Both approximation algorithms have O(\log N + M) query cost in any fixed dimension, where N and M are the sizes of the data set P and query group Q. Extensive experiments on both synthetic and real data sets, up to 10 million points and 74 dimensions, confirm the efficiency, effectiveness, and scalability of the proposed algorithms, especially their significant improvement over the state-of-the-art method.
INDEX TERMS
Aggregate nearest neighbor, approximate nearest neighbor, minmax nearest neighbor, nearest neighbor.
CITATION
Feifei Li, Bin Yao, Piyush Kumar, "Group Enclosing Queries", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 10, pp. 1526-1540, October 2011, doi:10.1109/TKDE.2010.181
REFERENCES
 [1] CGAL, Computational Geometry Algorithms Library, http:/www.cgal.org, 2011. [2] Open Street Map, http:/www.openstreetmap.org, 2011. [3] Qhull, The Quickhull Algorithm for Convex Hulls, http:/www.qhull.org, 2011. [4] S. Arya and D.M. Mount, "Approximate Range Searching," Computational Geometry Theory and Applications, vol. 17, nos. 3/4, pp. 135-152, 2000. [5] S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, and A.Y. Wu, "An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions," J. ACM, vol. 45, no. 6, pp. 891-923, 1998. [6] F. Aurenhammer, "Voronoi Diagrams - A Survey of a Fundamental Geometric Data Structure," ACM Computing Survey, vol. 23, no. 3, pp. 345-405, 1991. [7] M. Bādoiu, S. Har-Peled, and P. Indyk, "Approximate Clustering via Core-Sets," Proc. Ann. ACM Symp. Theory of Computing (STOC), 2002. [8] N. Beckmann, H.P. Kriegel, R. Schneider, and B. Seeger, "The ${\rm R}^{\ast}$ -Tree: An Efficient and Robust Access Method for Points and Rectangles," Proc. ACM SIGMOD Int'l Conf. Management of Data, 1990. [9] M. Berg, M. Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications. Springer, 1997. [10] C. Böhm, "A Cost Model for Query Processing in High Dimensional Data Spaces," ACM Trans. Database Systems, vol. 25, no. 2, pp. 129-178, 2000. [11] K. Chakrabarti, K. Porkaew, and S. Mehrotra The Color Data Set, http://kdd.ics.uci.edu/databases/CorelFeatures CorelFeatures. data.html, 2011. [12] B. Cui, B.C. Ooi, J. Su, and K.-L. Tan, "Contorting High Dimensional Data for Efficient Main Memory kNN Processing," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003. [13] R. Fagin, R. Kumar, and D. Sivakumar, "Efficient Similarity Search and Classification via Rank Aggregation," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003. [14] K. Fischer and B. Gartner, "The Smallest Enclosing Ball of Balls: Combinatorial Structure and Algorithms," Proc. Ann. Symp. Computational Geometry (SoCG), 2003. [15] A. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. Int'l Conf. Very Large Data Bases (VLDB), 1999. [16] M.T. Goodrich, J.-J. Tsay, D.E. Vengroff, and J.S. Vitter, "External-Memory Computational Geometry," Proc. IEEE Ann. Foundations of Computer Science (FOCS), 1993. [17] A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," Proc. ACM SIGMOD Int'l Conf. Management of Data, 1984. [18] M. Hadjieleftheriou, "The Spatialindex Library," http://www. research.att.com/~marioh/spatialindex index.html, 2011. [19] G.R. Hjaltason and H. Samet, "Distance Browsing in Spatial Databases," ACM Trans. Database Systems, vol. 24, no. 2, pp. 265-318, 1999. [20] P. Indyk and R. Motwani, "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality," Proc. Ann. ACM Symp. Theory of Computing (STOC), 1998. [21] H.V. Jagadish, B.C. Ooi, K.-L. Tan, C. Yu, and R. Zhang, "iDistance: An Adaptive B$^+$ -Tree Based Indexing Method for Nearest Neighbor Search," ACM Trans. Database Systems, vol. 30, no. 2, pp. 364-397, 2005. [22] P. Kumar, J.S.B. Mitchell, and E.A. Yildirim, "Approximate Minimum Enclosing Balls in High Dimensions Using Core-Sets," ACM J. Experimental Algorithmics, vol. 8, pp. 1-29, 2003. [23] Y. LeCun and C. Cortes, "The MNIST Data Set," http://yann. lecun.com/exdbmnist/, 2011. [24] H. Li, H. Lu, B. Huang, and Z. Huang, "Two Ellipse-Based Pruning Methods for Group Nearest Neighbor Queries," Proc. Ann. ACM Int'l Workshop Geographic Information Systems (GIS), 2005. [25] K. Mouratidis, D. Papadias, and S. Papadimitriou, "Tree-Based Partition Querying: A Methodology for Computing Medoids in Large Spatial Datasets," VLDB J., vol. 17, no. 4, pp. 923-945, 2008. [26] D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis, "Group Nearest Neighbor Queries," Proc. Int'l Conf. Data Eng. (ICDE), 2004. [27] D. Papadias, Y. Tao, K. Mouratidis, and C.K. Hui, "Aggregate Nearest Neighbor Queries in Spatial Databases," ACM Trans. Database Systems, vol. 30, no. 2, pp. 529-576, 2005. [28] G. Proietti and C. Faloutsos, "Analysis of Range Queries and Self-Spatial Join Queries on Real Region Datasets Stored Using an R-Tree," IEEE Trans. Knowledge and Data Eng., vol. 12, no. 5, pp. 751-762, Sept./Oct. 2000. [29] K. Rose and B.S. Manjunath, "The CORTINA Data Set," http://www.scl.ece.ucsb.edu/datasetsindex.htm , 2011. [30] N. Roussopoulos, S. Kelley, and F. Vincent, "Nearest Neighbor Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data, 1995. [31] Y. Tao, K. Yi, C. Sheng, and P. Kalnis, "Quality and Efficiency in High Dimensional Nearest Neighbor Search," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2009. [32] Y. Theodoridis and T. Sellis, "A Model for the Prediction of R-Tree Performance," Proc. ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS), 1996. [33] B. Yao, F. Li, and P. Kumar, "Reverse Furthest Neighbors in Spatial Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2009. [34] C. Yu, B.C. Ooi, K.-L. Tan, and H.V. Jagadish, "Indexing the Distance: An Efficient Method to KNN Processing," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2001. [35] H. Yu, P.K. Agarwal, R. Poreddy, and K.R. Varadarajan, "Practical Methods for Shape Fitting and Kinetic Data Structures Using Core Sets," Proc. Ann. Symp. Computational Geometry (SoCG), 2004. [36] D. Zhang, Y. Du, T. Xia, and Y. Tao, "Progressive Computation of the Min-Dist Optimal-Location Query," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.