The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2006 vol.18)
pp: 748-763
ABSTRACT
Support Vector Machines (SVMs) have been adopted by many data mining and information-retrieval applications for learning a mining or query concept, and then retrieving the "{\rm{top}}{\hbox{-}}k” best matches to the concept. However, when the data set is large, naively scanning the entire data set to find the top matches is not scalable. In this work, we propose a kernel indexing strategy to substantially prune the search space and, thus, improve the performance of {\rm{top}}{\hbox{-}}k queries. Our kernel indexer (KDX) takes advantage of the underlying geometric properties and quickly converges on an approximate set of {\rm{top}}{\hbox{-}}k instances of interest. More importantly, once the kernel (e.g., Gaussian kernel) has been selected and the indexer has been constructed, the indexer can work with different kernel-parameter settings (e.g., \gamma and \sigma) without performance compromise. Through theoretical analysis and empirical studies on a wide variety of data sets, we demonstrate KDX to be very effective. An earlier version of this paper appeared in the 2005 SIAM International Conference on Data Mining [24]. This version differs from the previous submission in providing a detailed cost analysis under different scenarios, specifically designed to meet the varying needs of accuracy, speed, and space requirements, developing an approach for insertion and deletion of instances, presenting the specific computations as well as the geometric properties used in performing the same, and providing detailed algorithms for each of the operations necessary to create and use the index structure.
INDEX TERMS
Support vector machine, indexing, {\rm{top}}{\hbox{-}}k retrieval.
CITATION
Navneet Panda, Edward Y. Chang, "KDX: An Indexer for Support Vector Machines", IEEE Transactions on Knowledge & Data Engineering, vol.18, no. 6, pp. 748-763, June 2006, doi:10.1109/TKDE.2006.101
36 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool