2013 IEEE 29th International Conference on Data Engineering (ICDE) (2000)

San Diego, California

Feb. 28, 2000 to Mar. 3, 2000

ISSN: 1063-6382

ISBN: 0-7695-0506-6

pp: 244

Paolo Ciaccia , University of Bologna and CSITE-CNR

Marco Patella , University of Bologna and CSITE-CNR

ABSTRACT

In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures - the so-called "curse of dimensionality". This also affects approximately correct (AC) algorithms, which return as result a point whose distance from q is less than \math times the distance between q and its true NN.In this paper we introduce a new approach to approximate similarity search, called PAC-NN queries, where the error bound \math can be exceeded with probability \math and both \math and \math parameters can be tuned at query time to trade the quality of the result for the cost of the search.We describe sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound. Analysis and experimental evaluation of the sequential algorithm confirm that, for moderately large data sets and suitable \math and \math values, PAC-NN queries can be efficiently solved and the error controlled. Then, we provide experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the result.

INDEX TERMS

Nearest Neighbor Search, Metric Spaces, Curse of Dimensionality, Approximate Queries, Distance Distribution

CITATION

Paolo Ciaccia,
Marco Patella,
"PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces",

*2013 IEEE 29th International Conference on Data Engineering (ICDE)*, vol. 00, no. , pp. 244, 2000, doi:10.1109/ICDE.2000.839417