CSDL Home IEEE Transactions on Knowledge & Data Engineering 2000 vol.12 Issue No.06 - November/December
Issue No.06 - November/December (2000 vol.12)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/69.895805
<p><b>Abstract</b>—In recent years, parallel processing and optimization algorithms for processing object-oriented databases have drawn a considerable amount of attention from the database research community. Two general types of algorithms have been introduced: hybrid-hash pointer-based algorithms and multiwavefront algorithms. In this work, we quantitatively analyze the two algorithms and develop analytical formulas to capture the main performance features of these two approaches. We study their performance in three application environments: One is characterized by large databases having many object classes, each of which contains a large number of instances; the second one is characterized by large databases having many object classes, each of which contains a relatively small number of instances; and the third one is by large databases having object classes of varying sizes. A horizontal data partitioning strategy, in which each object class is partitioned into horizontal segments stored across all processors, is used in the first environment. A class-per-node assignment strategy, in which instances of each object class are stored in a single processor, is used in the second environment. In the third environment, object classes are partitioned horizontally and assigned to a varying number of processors depending on their different sizes. Our analytical results show that the multiwavefront algorithm has three distinguishing features which contribute to its better performance: 1) two-phase processing strategy, 2) vertical partitioning of horizontal segments, and 3) dynamic determination of “collision point” in multiwavefront propagations which results in an optimized query execution plan. We show that if these features are adopted by a hybrid-hash, pointer-based algorithm, its performance will be comparable with that of the multiwavefront algorithm because the difference in CPU time between them is negligible. The assumed computing environment is a network of workstations having a share-nothing architecture. The schema and some queries selected from the OO7 benchmark are used in the performance analyses and comparisons. The queries are modified slightly in different data environments in order to reflect the features of diverse database applications.</p>
Object-oriented databases, parallel query processing algorithms, performance analysis, data partitioning strategies, database benchmark.
Stanley Y.W. Su, Sanjay Ranka, Xiang He, "Performance Analysis of Parallel Query Processing Algorithms for Object-Oriented Databases", IEEE Transactions on Knowledge & Data Engineering, vol.12, no. 6, pp. 979-997, November/December 2000, doi:10.1109/69.895805