Proceedings. 20th International Conference on Data Engineering (2004)
Mar. 30, 2004 to Apr. 2, 2004
Nick Koudas , AT&T Labs Research, Shannon Laboratory Florham Park, USA
Beng Chin Ooi , National University of Singapore
Heng Tao Shen , National University of Singapore
Anthony K. H. Tung , National University of Singapore
Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of hyper-dimensional databases which can contain hundreds or even thousands of dimensions. Such hyper-dimensional databases pose significant problems to existing high-dimensional indexing techniques which have been developed for indexing databases with (commonly) less than a hundred dimensions. To support efficient querying and retrieval on hyper-dimensional databases, we propose a methodology called Local Digital Coding (LDC) which can support k-nearest neighbors (KNN) queries on hyper-dimensional databases and yet co-exist with ubiquitous indices, such as B<sup>+</sup>-trees. LDC extracts a simple bitmap representation called Digital Code(DC) for each point in the database.Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the DC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between hyper-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic hyper-dimensional datasets.
H. T. Shen, N. Koudas, B. C. Ooi and A. K. Tung, "LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space," Proceedings. 20th International Conference on Data Engineering(ICDE), Boston, Massachusetts, 2004, pp. 6.