The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2013 vol.25)
pp: 481-493
O. Kucuktunc , Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
H. Ferhatosmanoglu , Dept. of Comput. Eng., Bilkent Univ., Ankara, Turkey
ABSTRACT
Traditional search methods try to obtain the most relevant information and rank it according to the degree of similarity to the queries. Diversity in query results is also preferred by a variety of applications since results very similar to each other cannot capture all aspects of the queried topic. In this paper, we focus on the \lambda-diverse k-nearest neighbor search problem on spatial and multidimensional data. Unlike the approach of diversifying query results in a postprocessing step, we naturally obtain diverse results with the proposed geometric and index-based methods. We first make an analogy with the concept of Natural Neighbors (NatN) and propose a natural neighbor-based method for 2D and 3D data and an incremental browsing algorithm based on Gabriel graphs for higher dimensional spaces. We then introduce a diverse browsing method based on the distance browsing feature of spatial index structures, such as R-trees. The algorithm maintains a Priority Queue with mindivdist of the objects depending on both relevancy and angular diversity and efficiently prunes nondiverse items and nodes. We experiment with a number of spatial and high-dimensional data sets, including Factual's (http://www.factual.com/) US points-of-interest data set of 13M entries. On the experimental setup, the diverse browsing method is shown to be more efficient (regarding disk accesses) than k-NN search on R-trees, and more effective (regarding Maximal Marginal Relevance (MMR)) than the diverse nearest neighbor search techniques found in the literature.
INDEX TERMS
Search problems, Nearest neighbor searches, Spatial databases, Search methods, Query processing, Diversity methods, Information retrieval, Gabriel graph, Diversity, diverse nearest neighbor search, angular similarity, natural neighbors
CITATION
O. Kucuktunc, H. Ferhatosmanoglu, "λ-diverse nearest neighbors browsing for multidimensional data", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 3, pp. 481-493, March 2013, doi:10.1109/TKDE.2011.251
REFERENCES
[1] P.K. Agarwal, L. Arge, and K. Yi, "I/O-Efficient Construction of Constrained Delaunay Triangulations," Proc. 13th European Symp. Algorithms (ESA '05), pp. 355-366, 2005.
[2] F. Aurenhammer and R. Klein, "Voronoi Diagrams," Handbook of Computational Geometry, J. Sack and G. Urrutia, eds., chapter 5, pp. 201-290, Elsevier Science Publising, 2000.
[3] C.B. Barber, D.P. Dobkin, and H. Huhdanpaa, "The Quickhull Algorithm for Convex Hulls," ACM Trans. Math. Software, vol. 22, no. 4, pp. 469-483, 1996.
[4] J. Carbonell and J. Goldstein, "The use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries," Proc. 21st Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '98), pp. 335-336, 1998.
[5] B. Carterette, "An Analysis of NP-Completeness in Novelty and Diversity Ranking," Proc. Second Int'l Conf. Theory of Information Retrieval (ICTIR '09), pp. 200-211, 2009.
[6] B. Carterette, "An Analysis of NP-Completeness in Novelty and Diversity Ranking," Information Retrieval, vol. 14, no. 1, pp. 89-106, 2011.
[7] H. Chen and D.R. Karger, "Less is More: Probabilistic Models for Retrieving Fewer Relevant Documents," Proc. 29th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '06), pp. 429-436, 2006.
[8] C.L. Clarke, M. Kolla, G.V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon, "Novelty and Diversity in Information Retrieval Evaluation," Proc. 31st Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '08), pp. 659-666, 2008.
[9] M. Drosou and E. Pitoura, "Diversity Over Continuous Data," IEEE Data Eng. Bull., vol. 32, no. 4, pp. 49-56, Dec. 2009.
[10] R.A. Dwyer, "Higher-Dimensional Voronoi Diagrams in Linear Expected Time," Proc. Fifth Ann. Symp. Computational Geometry (SCG '89), pp. 326-333, 1989.
[11] S. Fortune, Voronoi Diagrams and Delaunay Triangulations. 1992.
[12] K.R. Gabriel and R.R. Sokal, "A New Statistical Approach to Geographic Variation Analysis," Systematic Zoology, vol. 18, no. 3, pp. 259-278, 1969.
[13] J.C. Gower, "A General Coefficient of Similarity and Some of Its Properties," Biometrics, vol. 27, no. 4, pp. 857-871, 1971.
[14] M. Halvey, P. Punitha, D. Hannah, R. Villa, F. Hopfgartner, A. Goyal, and J.M. Jose, "Diversity, Assortment, Dissimilarity, Variety: A Study of Diversity Measures Using Low Level Features for Video Retrieval," Proc. 31th European Conf. IR Research on Advances in Information Retrieval (ECIR '09), pp. 126-137, 2009.
[15] J.R. Haritsa, "The KNDN Problem: A Quest for Unity in Diversity," IEEE Data Eng. Bull., vol. 32, no. 4, pp. 15-22, Dec. 2009.
[16] G.R. Hjaltason and H. Samet, "Distance Browsing in Spatial Databases," ACM Trans. Database Systems, vol. 24, no. 2, pp. 265-318, 1999.
[17] M. Isenburg, Y. Liu, J. Shewchuk, and J. Snoeyink, "Streaming Computation of Delaunay Triangulations," ACM Trans. Graphics, vol. 25, pp. 1049-1056, July 2006.
[18] A. Jain, P. Sarda, and J.R. Haritsa, "Providing Diversity in K-Nearest Neighbor Query Results," Proc. Eighth Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD '04), pp. 404-413, 2003.
[19] H. Ledoux and C. Gold, "An Efficient Natural Neighbour Interpolation Algorithm for Geoscientific Modelling," Proc. 11th Int'l Symp. Spatial Data Handling (SDH '04), pp. 23-25, 2004.
[20] X.-Y. Li, P.-J. Wan, Y. Wang, and O. Frieder, "Sparse Power Efficient Topology for Wireless Networks," Proc. 35th Ann. Hawaii Int'l Conf. System Sciences (HICSS '02), pp. 3839-3848, 2002.
[21] B. Liu and H.V. Jagadish, "Using Trees to Depict a Forest," Proc. VLDB Endowment, vol. 2, no. 1, pp. 133-144, 2009.
[22] D.B. Lomet, "Letter from the Editor-in-Chief," IEEE Data Eng. Bull., vol. 32, no. 4, p. 1, Dec. 2009.
[23] D. Matula and R. Sokal, "Properties of Gabriel Graphs Relevant to Geographical Variation Research and the Clustering of Points on the Plane," Geographical Analysis, vol. 12, pp. 205-222, 1980.
[24] B.-U. Pagel, F. Korn, and C. Faloutsos, "Deflating the Dimensionality Curse Using Multiple Fractal Dimensions," Proc. 16th Int'l Conf. Data Eng. (ICDE '00), pp. 589-598, 2000.
[25] N. Roussopoulos, S. Kelley, and F. Vincent, "Nearest Neighbor Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '95), pp. 71-79, 1995.
[26] J.R. Shewchuk, "Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator," Selected papers from the Workshop on Applied Computational Geometry, Towards Geometric Engineering, pp. 203-222, 1996.
[27] R. Sibson, "A Brief Description of Natural Neighbor Interpolation," Interpolating Multivariate Data, vol. 21, pp. 21-36, 1981.
[28] Y. Tao, "Diversity in Skylines," IEEE Data Eng. Bull., vol. 32, no. 4, pp. 65-72, Dec. 2009.
[29] E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia, "Efficient Computation of Diverse Query Results," Proc. 24th Int'l Conf. Data Eng. (ICDE '08), pp. 228-236, 2008.
[30] C. Yu, L. Lakshmanan, and S. Amer-Yahia, "It Takes Variety to Make a World: Diversification in Recommender Systems," Proc. 12th Int'l Conf. Extending Database Technology (EDBT '09), pp. 368-378, 2009.
[31] C. Yu, L. Lakshmanan, and S. Amer-Yahia, "Recommendation Diversification Using Explanations," Proc. 25th Int'l Conf. Data Eng. (ICDE '09), pp. 1299-1302, 2009.
[32] C.-N. Ziegler and G. Lausen, "Making Product Recommendations More Diverse," IEEE Data Eng. Bull., vol. 32, no. 4, pp. 23-32, Dec. 2009.
[33] C.-N. Ziegler, S.M. McNee, J.A. Konstan, and G. Lausen, "Improving Recommendation Lists through Topic Diversification," Proc. 14th Int'l Conf. World Wide Web (WWW '05), pp. 22-32, 2005.
53 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool