Data Node Splitting Policies for Improved Range Query Efficiency in k-dimensional Point Data Indexes
2012 16th Panhellenic Conference on Informatics (2011)
Sept. 30, 2011 to Oct. 2, 2011
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PCI.2011.46
High dimensional vectors (points) are very common in image and video classification, time series data mining, and many modern data mining applications. One of the most popular classification methods on such data is k-Nearest Neighbor (kNN) searching. Unfortunately, all proposed and state-of-the-art multi-attribute indexes fall short in terms of usability as dimensionality increases. This is attributed to the ``dimensionality curse" problem, according to which, range searching above 10 dimensions is as efficient as a sequential scan of the entire database. Thus, kNN searching, as a special case of range searching, has to benefit a lot if we find ways to increase the performance of indexes in high dimensions. In this paper, we deal with space partitioning indexes and we propose six data node splitting techniques. We examine their performance in terms of data node storage utilization and quality of space partitioning. These two conflicting goals are both essential for good range query performance. Our experiments with uniform and skewed data demonstrate that certain splitting techniques can perform satisfactorily.
multi-attribute point data indexes, average storage utilization, space partitioning quality, range query performance
Georgios Evangelidis, Evangelos Outsios, "Data Node Splitting Policies for Improved Range Query Efficiency in k-dimensional Point Data Indexes", 2012 16th Panhellenic Conference on Informatics, vol. 00, no. , pp. 46-50, 2011, doi:10.1109/PCI.2011.46