This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Performance Analysis of Parallel Query Processing Algorithms for Object-Oriented Databases
November/December 2000 (vol. 12 no. 6)
pp. 979-997

Abstract—In recent years, parallel processing and optimization algorithms for processing object-oriented databases have drawn a considerable amount of attention from the database research community. Two general types of algorithms have been introduced: hybrid-hash pointer-based algorithms and multiwavefront algorithms. In this work, we quantitatively analyze the two algorithms and develop analytical formulas to capture the main performance features of these two approaches. We study their performance in three application environments: One is characterized by large databases having many object classes, each of which contains a large number of instances; the second one is characterized by large databases having many object classes, each of which contains a relatively small number of instances; and the third one is by large databases having object classes of varying sizes. A horizontal data partitioning strategy, in which each object class is partitioned into horizontal segments stored across all processors, is used in the first environment. A class-per-node assignment strategy, in which instances of each object class are stored in a single processor, is used in the second environment. In the third environment, object classes are partitioned horizontally and assigned to a varying number of processors depending on their different sizes. Our analytical results show that the multiwavefront algorithm has three distinguishing features which contribute to its better performance: 1) two-phase processing strategy, 2) vertical partitioning of horizontal segments, and 3) dynamic determination of “collision point” in multiwavefront propagations which results in an optimized query execution plan. We show that if these features are adopted by a hybrid-hash, pointer-based algorithm, its performance will be comparable with that of the multiwavefront algorithm because the difference in CPU time between them is negligible. The assumed computing environment is a network of workstations having a share-nothing architecture. The schema and some queries selected from the OO7 benchmark are used in the performance analyses and comparisons. The queries are modified slightly in different data environments in order to reflect the features of diverse database applications.

[1] M.J. Carey, D.J. DeWitt, and J.F. Naughton, “The OO7 Benchmark,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 12–21, May 1993.
[2] M.-S. Chen,P.S. Yu,, and K.-L. Wu,“Scheduling and processor allocation for parallel execution of multi-join queries,” Proc. Eighth Int’l Conf. Data Engineering, pp. 58-67, Feb. 1992.
[3] Y.H. Chen and S.Y.W. Su, "Identification- and Elimination-Based Query Processing Techniques for Object-Oriented Databases," J. Parallel and Distributed Computing, vol. 28,, pp. 130-148, 1995.
[4] Y.H. Chen and S.Y.W. Su, "Implementation and Evaluation of Parallel Query Processing Algorithms and Data Partitioning Heuristics in Object-Oriented Databases," J. Distributed and Parallel Databases, vol. 4, pp. 107-142, 1996.
[5] S. Cluet and C. Delobel, “A General Framework for the Optimization of Object-Oriented Queries,” Proc. ACM SIGMOD Conf., pp. 383–392, June 1992.
[6] D. DeWitt et al., "GAMMA—A High Performance Backend Database Machine," Proc. 12th Conf. Very Large Data Bases,Kyoto, Japan, Aug. 1986.
[7] D.J. DeWitt,S. Ghandeharizadeh,D.A. Schneider,A. Bricker,H.I. Hsiao,, and R. Rasmussen,“The gamma database machine project,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 44-62, Mar. 1990.
[8] G. Graefe et al., “Extensible Query Optimization and Parallel Execution in Volcano,” Query Processing for Advanced Database Systems, J.C. Freytag et al., ed., Morgan Kaufmann, 1994.
[9] L. Harada, N. Akaboshi, and M. Nakano, “An Effective Parallel Processing of Multiway Joins by Considering Resources Consumption,” Proc. Int'l Conf. Computing Information, 1994.
[10] Y. Huang, S.Y.W. Su, and Y.M. Chiang, “Graph-Based Parallel Query Processing and Optimization in Object-Oriented Databases,” technical report, Database Systems Research and Development Center, Univ. of Florida, Gainesville, 1994.
[11] K.-C. Kim, “Parallelism in Object-Oriented Query Processing,” Proc. Sixth Int'l Conf. Data Eng., pp. 209–217, Feb. 1990.
[12] M. Kitsuregawa and Y. Ogawa, “Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer(SDC),” Proc. 16th Int'l Conf. Very Large Data Bases, pp. 210–221, Aug. 1990.
[13] H. Lam, C. Lee, and S.Y.W. Su, “An Object Flow Computer for Database Applications,” Proc. Int'l Workshop Database Machines, pp. 1–17, June 1989.
[14] C. Lee, H. Lam, and S.Y.W. Su, “An Object Flow Computer for Database Applications: Design and Performance Evaluation,” J. Parallel and Distributed Computing, vol. 17, pp. 298–314, 1993.
[15] D.F. Lieuwen, D. DeWitt, and M. Mehta, “Parallel Pointer-Based Join Techniques for Object-Oriented Databases,” Proc. Second Int'l Conf. Parallel and Distributed Information Systems, pp. 172–181, Jan. 1993.
[16] W. Meng, C. Liu, W. Sun, and C. Yu, “Predict Query Processing Cost in a Distributed Database System,” Proc. Int'l Conf. Database and Expert Application, Sept. 1993.
[17] D. Schneider and D. DeWitt, “A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment,” ACM SIGMOD Record, vol. 18, no. 2, pp. 110-121, June 1989.
[18] D.A. Schneider and D.J. DeWitt, “Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines,” Proc. 16th Int'l Conf. Very Large Data Bases, pp. 469–480, Aug. 1990.
[19] E.J. Shekita and M.J. Carey, “A Performance Evaluation of Pointer-Based Joins,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1990.
[20] A. Swami and A. Gupta,“Optimization of large join queries,” Proc. ACM-SIGMOD Conf., pp. 8-17, 1988.
[21] S.Y.W. Su, Y.-H. Chen, and H. Lam, “Multiple Wavefront Algorithms for Pattern-based Processing of Object-Oriented Databases,” Proc. First Int'l Conf. Parallel and Distributed Information Systems, pp. 46–55, Dec. 1991.
[22] D. Suciu, “Implementation and Analysis of a Parallel Collection Query Language,” Proc. 22nd Int'l Conf. Very Large Data Bases, pp. 366–377, Sept. 1996
[23] A.K. Thakore, S.Y.W. Su, H. Lam, and D.G. Shea, “Asynchronous Parallel Processing of Object Bases Using Multiple Wavefronts,” Proc. Int'l Conf. Parallel Processing, pp. 127–135, Aug. 1990.
[24] A.K. Thakore and S.Y.W. Su, “Performance Analysis of Parallel Object-Oriented Query Processing Algorithms,” Distributed and Parallel Databases, vol. 2, no. 1, pp. 59–100, 1994.
[25] A.K. Thakore, S.Y.W. Su, and H. Lam, “Algorithms for Asynchronous Parallel Processing of Object-Oriented Databases,” IEEE Trans. Knowledge and Data Eng., vol. 7, no. 3, pp. 487–504, Mar. 1995.
[26] P. Valduriez and G. Gardarin,“Join and semijoin algorithms for a multiprocessor database machine,” ACM Trans. on Database Systems, vol. 9, no. 1, pp. 133-161, Mar. 1984.
[27] P. Valduriez, “Join Indices,” ACM Trans. Database Systems, vol. 12, no. 2, 1987.

Index Terms:
Object-oriented databases, parallel query processing algorithms, performance analysis, data partitioning strategies, database benchmark.
Citation:
Stanley Y.W. Su, Sanjay Ranka, Xiang He, "Performance Analysis of Parallel Query Processing Algorithms for Object-Oriented Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 6, pp. 979-997, Nov.-Dec. 2000, doi:10.1109/69.895805
Usage of this product signifies your acceptance of the Terms of Use.