This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)
Surrogate ranking for very expensive similarity queries
Long Beach, CA, USA
March 01-March 06
ISBN: 978-1-4244-5445-7
Fei Xu, CISE Department, University of Florida, Gainesville, 32601, USA
Ravi Jampani, CISE Department, University of Florida, Gainesville, 32601, USA
Mingxi Wu, Oracle Corp. Redwood Shores, CA, 94065, USA
Chris Jermaine, CISE Department, University of Florida, Gainesville, 32601, USA
Tamer Kahveci, CISE Department, University of Florida, Gainesville, 32601, USA
We consider the problem of similarity search in applications where the cost of computing the similarity between two records is very expensive, and the similarity measure is not a metric. In such applications, comparing even a tiny fraction of the database records to a single query record can be orders of magnitude slower than reading the entire database from disk, and indexing is often not possible. We develop a general-purpose, statistical framework for answering top-k queries in such databases, when the database administrator is able to supply an inexpensive surrogate ranking function that substitutes for the actual similarity measure. We develop a robust method that learns the relationship between the surrogate function and the similarity measure. Given a query, we use Bayesian statistics to update the model by taking into account the observed partial results. Using the updated model, we construct bounds on the accuracy of the result set obtained via the surrogate ranking. Our experiments show that our models can produce useful bounds for several real-life applications.
Citation:
Fei Xu, Ravi Jampani, Mingxi Wu, Chris Jermaine, Tamer Kahveci, "Surrogate ranking for very expensive similarity queries," icde, pp.848-859, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), 2010
Usage of this product signifies your acceptance of the Terms of Use.