Issue No. 02 - Feb. (2014 vol. 36)
Sudheendra Vijayanarasimhan , Univ. of Texas at Austin, Austin, TX, USA
Prateek Jain , Machine Learning Group, Microsoft Res., Bangalore, India
Kristen Grauman , Univ. of Texas at Austin, Austin, TX, USA
We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the entire database. For this problem, we propose two hashing-based solutions. Our first approach maps the data to 2-bit binary keys that are locality sensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sublinear time. Our first method's preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to pool-based active learning: Taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the well-known minimal distance-to-hyperplane selection criterion. We empirically demonstrate our methods' tradeoffs and show that they make it practical to perform active selection with millions of unlabeled points.
Vectors, Databases, Approximation methods, Search problems, Approximation algorithms, Euclidean distance, Algorithm design and analysis
S. Vijayanarasimhan, P. Jain and K. Grauman, "Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 36, no. 2, pp. 276-288, 2014.