2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) (2010)
Long Beach, CA, USA
Mar. 1, 2010 to Mar. 6, 2010
Bin Yao , Computer Science Department, Florida State University, Tallahassee, U.S.A.
Feifei Li , Computer Science Department, Florida State University, Tallahassee, U.S.A.
Piyush Kumar , Computer Science Department, Florida State University, Tallahassee, U.S.A.
Finding the k nearest neighbors (kNN) of a query point, or a set of query points (kNN-Join) are fundamental problems in many application domains. Many previous efforts to solve these problems focused on spatial databases or stand-alone systems, where changes to the database engine may be required, which may limit their application on large data sets that are stored in a relational database management system. Furthermore, these methods may not automatically optimize kNN queries or kNN-Joins when additional query conditions are specified. In this work, we study both the kNN query and the kNN-Join in a relational database, possibly augmented with additional query conditions. We search for relational algorithms that require no changes to the database engine. The straightforward solution uses the user-defined-function (UDF) that a query optimizer cannot optimize.We design algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the “best” query plan. Using only a small constant number of random shifts for databases in any fixed dimension, our approach guarantees to find the approximate kNN with only logarithmic number of page accesses in expectation with a constant approximation ratio and it could be extended to find the exact kNN efficiently in any fixed dimension. Our design paradigm easily supports the kNN-Join and updates. Extensive experiments on large, real and synthetic, data sets confirm the efficiency and practicality of our approach.
P. Kumar, B. Yao and F. Li, "K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free," 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)(ICDE), Long Beach, CA, USA, 2010, pp. 4-15.