International Conference on Information Technology: Coding and Computing
Performance of KDB-Trees with Query-Based Splitting
Las Vegas, Nevada
April 08-April 10
ISBN: 0-7695-1506-1
While the persistent data of many advanced database applications, such as OLAP and scientific studies, are characterized by very high dimensionality, typical queries posed on these data appeal to a small number of relevant dimensions. Unfortunately, the multi-dimensional access methods designed for high-dimensional data perform rather poorly for these partially specified queries. A potentially very appealing idea, frequently suggested in the literature, is to adopt a node-splitting policy that takes into account the "importance" of individual dimensions, which could be determined either a priori or through a statistical sampling of actual queries. This paper presents the results of some carefully controlled experiments conducted to observe the effects of query-based splitting on the performance of KDB-trees. The strategy is compared to a splitting policy that selects the split dimensions in a "cyclic" fashion, which has been shown to be very effective, especially in high-dimensional situations. Based on the results, the query-based splitting does not appear to be a very appealing splitting strategy for KDB-trees.
Index Terms:
information databases, multi-dimensional databases, access methods, data dimensionality
Citation:
Yves Lépouchard, John L. Pfaltz, Ratko Orlandic, "Performance of KDB-Trees with Query-Based Splitting," itcc, pp.0218, International Conference on Information Technology: Coding and Computing, 2002