The Community for Technology Leaders
Green Image
Issue No. 03 - March (2017 vol. 29)
ISSN: 1041-4347
pp: 556-571
Lu Chen , College of Computer Science, Zhejiang University, Hangzhou, China
Yunjun Gao , College of Computer Science, Zhejiang University, Hangzhou, China
Xinhan Li , College of Computer Science, Zhejiang University, Hangzhou, China
Christian S. Jensen , Department of Computer Science, Aalborg University, Aalborg, Denmark
Gang Chen , College of Computer Science, Zhejiang University, Hangzhou, China
ABSTRACT
Spatial queries including similarity search and similarity joins are useful in many areas, such as multimedia retrieval, data integration, and so on. However, they are not supported well by commercial DBMSs. This may be due to the complex data types involved and the needs for flexible similarity criteria seen in real applications. In this paper, we propose a versatile and efficient disk-based index for metric data, the S pace-filling curve and Pivot-based B $^{+}$ -tree (SPB-tree). This index leverages the B $^+$ -tree, and uses space-filling curve to cluster data into compact regions, thus achieving storage efficiency. It utilizes a small set of so-called pivots to reduce significantly the number of distance computations when using the index. Further, it makes use of a separate random access file to support a broad range of data. By design, it is easy to integrate the SPB-tree into an existing DBMS. We present efficient algorithms for processing similarity search and similarity joins, as well as corresponding cost models based on SPB-trees. Extensive experiments using both real and synthetic data show that, compared with state-of-the-art competitors, the SPB-tree has much lower construction cost, smaller storage size, and supports more efficient similarity search and similarity joins with high accuracy cost models.
INDEX TERMS
Indexes, Extraterrestrial measurements, Search problems, Query processing, Acceleration, Clustering algorithms
CITATION
Lu Chen, Yunjun Gao, Xinhan Li, Christian S. Jensen, Gang Chen, "Efficient Metric Indexing for Similarity Search and Similarity Joins", IEEE Transactions on Knowledge & Data Engineering, vol. 29, no. , pp. 556-571, March 2017, doi:10.1109/TKDE.2015.2506556
290 ms
(Ver 3.3 (11022016))