This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Hierarchical Approach to Parallel Multiquery Scheduling
June 1995 (vol. 6 no. 6)
pp. 578-590

Abstract—There has been a good deal of progress made recently toward the efficient parallelization of individual phases of single queries in multiprocessor database systems. In this paper we devise and experimentally evaluate a number of scheduling algorithms designed to handle multiple parallel queries. (Scheduling in this context implies the determination of both processor allotments and temporal processor assignments to individual queries and query phases.) One of these algorithms performs the best in our experiments. This algorithm is hierarchical in nature: In the first phase, a good quality precedence-based schedule is created for each individual query and each possible number of processors. This component employs dynamic programming. In the second phase, the results of the first phase are used to create an overall schedule of the full set of queries. This component is based on previously published work on nonprecedence-based malleable scheduling. Even though the problem we are considering is NP-hard in the strong sense, the multiple query schedules generated by our hierarchical algorithm are seen experimentally to achieve high quality results.

[1] B. Iyer, G. Ricard, and P. Varman,“Percentile finding algorithm for multiple sorted runs,”inProc. 15th Int. Conf. Very Large Databases, Amsterdam, The Netherlands, Aug. 1989, pp. 135–144.
[2] D. DeWitt, J. Naughton, and D. Schneider,“Parallel sorting on a shared-nothing architecture using probabilistic splitting,”inProc. 1st Int. Conf. Parallel and Distrib. Inform. Syst., Miami, FL, Dec. 1991, pp. 280–291.
[3] J. Wolf, D. Dias, and P. Yu,“A parallel sort merge join algorithm for managing data skew,”IEEE Trans. Parallel Distrib. Syst., pp. 70–86, Jan. 1993.
[4] J. Wolf, D. Dias, P. Yu, and J. Turek,“An effective algorithm for parallelizing hash joins in the presence of data skew,”inProc. 7th Int. Conf. Data Eng., Kobe, Japan, Apr. 1991, pp. 200–209.
[5] M. Kitsuregawa and Y. Ogawa, “Bucket Spreading Parallel Hash: A New Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC),” Proc. 16th Conf. Very Large Databases (VLDB), pp. 210-221, 1990.
[6] K. Hua and C. Lee,“Handling data skew in multiprocessor database computers using partition tuning,”inProc. 17th Int. Conf. Very Large Databases, Barcelona, Spain, Sept. 1991, pp. 525–535.
[7] D. DeWitt and R. Gerber,“Multiprocessor hash-based join algorithms,”inProc. 11th Int. Conf. Very Large Databases, Stockholm, Sweden, Aug. 1985, pp. 151–162.
[8] D. DeWitt, J. Naughton, D. Schneider, and S. Seshadri,“Practical skew handling in parallel joins,”inProc. 18th Int. Conf. Very Large Databases, Vancouver, B.C., Aug. 1992, pp. 27–40.
[9] M. Lakshmi and P. Yu,“Effectiveness of parallel joins,”IEEE Trans. Knowl. Data Eng., vol. 2, pp. 410–424, Dec. 1990.
[10] C. Walton,“Four types of data skew and their effect on parallel join performance,”Univ. of Texas, Tech. Rep. TR-90-12, 1990.
[11] D. Schneider and D. J. DeWitt,“Tradeoffs in processing complex join queries via hashing in multiprocessor database machines,”inProc. 16th Int. Conf. Very Large Databases, Brisbane, Australia, Aug. 1990, pp. 469–480
[12] D. Schneider and D. DeWitt, “A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment,” ACM SIGMOD Record, vol. 18, no. 2, pp. 110-121, June 1989.
[13] E. Rahm and R. Marek,“Analysis of dynamic load balancing strategies for parallel shared nothing databases,”inProc. 19th Int. Conf. Very Large Databases, Aug. 1993.
[14] P. S. Yu, M.-S. Chen, J. L. Wolf, and J. J. Turek,“Parallel query processing,”inAdvanced Database Systems, N. Adam and B. Bhargavach, Eds. New York: Springer-Verlag, Dec. 1993, ch. 12, pp. 239–258, Lecture Notes in Computer Science 759.
[15] H. Pirahesh, C. Mohan, J. Cheng, T. S. Liu, and P. Selinger,“Parallelism in relational data base systems: Architectural issues and design approaches,”inProc. 2nd. Symp. Databases Parallel Distrib. Syst., Dublin, Ireland, July 1990, pp. 4–29.
[16] D. DeWitt and J. Gray, “Parallel Database Systems: The Future of High-Performance Database Systems,” Comm. ACM, Vol. 35, No. 6, June 1992, pp. 85-98.
[17] R. Krishnamurthy, H. Boral, and C. Zaniolo,“Optimization of nonrecursive queries,”inProc. 12th Int. Conf. Very Large Databases, Kyoto, Japan, Aug. 1986, pp. 128–137.
[18] A. Swami,“Optimization of large join queries: Combining heuristics with combinatorial techniques,”inProc. ACM SIGMOD, Chicago, IL, June 1989, pp. 367–376.
[19] W. Hong,“Exploiting interoperator parallelism in XPRS,”inProc. ACM SIGMOD, San Diego, CA, June 1992, pp. 19–28.
[20] H. Lu, M.-C. Shan, and K.-L. Tan,“Optimization of multi-way join queries for parallel execution,”inProc. 17th Int. Conf. Very Large Databases, Barcelona, Spain, Sept. 1991, pp. 549–560.
[21] M.-S. Chen, P. Yu, and K.-L. Wu,“Scheduling and processor allocation for the execution of multi-join queries in a multiprocessor system,”inProc. 8th Int. Conf. Data Eng., Phoenix, AZ, Feb. 1992, pp. 58–67.
[22] E. Shekita, H. C. Young, and K. Tan,“Multijoin optimization for symmetric multiprocessors,”inProc. 19th Int. Conf. Very Large Databases, Aug. 1993, pp. 479–492.
[23] R.S.G. Lanzelotte,P. Valduriez,, and M. Zaït,“On the effectiveness of optimization search strategies for parallel execution spaces,” Proc. 19th Int’l Conf. Very Large Databases, pp. 493-504,Dublin, 1993.
[24] J. Turek, U. Schwiegelshohn, J. Wolf, and P. Yu,“Scheduling parallel tasks to minimize average response times,”inProc. 5th Annu. ACM-SIAM Symp. Discrete Algorithms, Alexandria, VA, Jan. 1994, pp. 200–209.
[25] J. Turek, U. Schwiegelshohn, J. Wolf, and P. Yu,“Scheduling parallel tasks to minimize average response times,”inProc. 5th Annu. ACM-SIAM Symp. Discrete Algorithms, Alexandria, VA, Jan. 1994, pp. 200–209.
[26] J. Turek, J. Wolf, K. Pattipati, and P. Yu,“Scheduling parallelizable tasks: Putting it all on the shelf,”inProc. ACM Sigmetr. Conf., Newport, RI, June 1992, pp. 225–236.
[27] J. Turek, J. Wolf, and P. Yu,“Approximate algorithms for scheduling parallelizable tasks,”inProc. 4th Annu. Symp. Parallel Algorithms, Architect., San Diego, CA, June 1992, pp. 323–332.
[28] R. Krishnamurti and E. Ma,“An approximation algorithm for scheduling tasks on varying partition sizes in partitionable multiprocessor systems,”IEEE Trans. Comput., vol. 41, pp. 1572–1579, Dec. 1992.
[29] J. Du and J. Leung,“Complexity of scheduling parallel task systems,”SIAM J. Discrete Math., vol. 2 no. 4, pp. 473–487, Nov. 1989.
[30] R. Graham,“Bounds on multiprocessing timing anomalies,”SIAM J. Comput., vol. 17, pp. 416–429, 1969.
[31] N. Kronenberg, H. Levy, and W. Strecker,“VAXcluster: A closely-coupled distributed system,”ACM Trans. Comput. Syst., vol. 4, pp. 130–146, May 1986.
[32] D. Reiner,“The Kendall square query decomposer,”inProc. 2nd Int. Conf. Parallel, Distrib. Inform. Syst., Jan. 1993, pp. 36–37.
[33] D. Shmoys, J. Wein, and D. Williamson,“Scheduling parallel machines on-line,”inProc. 32nd Annu. Symp. Found. Comput. Sci., Oct. 1991, pp. 131–140.
[34] S. Ghandeharizadeh and D. J. DeWitt,“A multiuser performance analysis of alternative declustering strategies,”inProc. 6th Int. Conf. Data Eng., 1990, pp. 466–475.
[35] S. Ganguly, W. Hasan, and R. Krishnamurthy,“Query optimization for parallel execution,”inProc. ACM SIGMOD, June 1992, pp. 9–18.
[36] M.-S. Chen, M.-L. Lo, P. S. Yu, and H. C. Young,“Using segmented right-deep trees for the execution of pipelined hash joins,”inProc. 18th Int. Conf. Very Large Databases, Aug. 1992, pp. 15–26.
[37] M.-L. Lo, M.-S. Chen, C. V. Ravishankar, and P. S. Yu,“On optimal processor allocation to support pipelined hash joins,”inProc. ACM SIGMOD, May 1993, pp. 69–78.
[38] H.-I. Hsiao, M.-S. Chen, and P. S. Yu,“On parallel execution of multiple pipelined hash joins,”inProc. ACM SIGMOD, Minneapolis, MN, May 1994, pp. 185–196.
[39] R. Sedgewick,Algorithms. Reading, MA: Addison-Wesley, 1983.
[40] E. Coffman, M. Garey, D. Johnson, and R. Tarjan,“Performance bounds for level-oriented two-dimensional packing algorithms,”SIAM J. Comput., vol. 9, no. 4, pp. 808–826, 1980.
[41] B. Baker, E. Coffman, and R. Rivest,“Orthogonal packings in two dimensions,”SIAM J. Comput., vol. 9, pp. 846–855, 1980.
[42] T. Ibaraki and N. Katoh,Resource Allocation Problems: Algorithmic Approaches. Cambridge, MA: M.I.T., 1988.
[43] A. Tantawi, D. Towsley, and J. Wolf,“Optimal allocation of multiple class resources in computer systems,”inProc. ACM Sigmetr. Conf., Santa Fe, NM, May 1988, pp. 253–260.
[44] J. Wolf, B. Iyer, K. Pattipati, and J. Turek,“Optimal buffer partitioning for the nested block join algorithm,”inProc. 7th Int. Conf. Data Eng., Kobe, Japan, Apr. 1991, pp. 510–519.
[45] H. Stone, J. Wolf, and J. Turek,“Optimal partitioning of cache memory,”IEEE Trans. Comput., vol. 41, pp. 1054–1068, 1992.
[46] B. Baker and J. Schwarz,“Shelf algorithms for two-dimensional packing problems,”SIAM J. Comput., vol. 12, no. 3, pp. 508–525, Aug. 1983.
[47] Q. Wang and H. Cheng,“A heuristic of scheduling parallel tasks and its analysis,”SIAM J. Comput., vol. 21, no. 2, pp. 281–294, Apr. 1992.
[48] D. DeWitt, S. Ghandeharizadeh, D. Schneider, A. Bricker, H. Hsiao, and R. Rasmussen,“The GAMMA database machine project,”IEEE Trans. Knowledge Data Eng., vol. 2, pp. 44–62, Mar. 1990.
[49] B. Iyer and D. Dias,“System issues in parallel sorting for database systems,”inProc. 6th Int. Conf. Data Eng., 1990, pp. 246–255.
[50] J. Tukey,Exploratory Data Analysis. Reading, MA: Addison-Wesley, 1977.
[51] P. Yu, M.-S. Chen, H. Heiss, and S. Lee,“On workload characterization of relational database environments,”IEEE Trans. Software Eng., vol. 18, pp. 347–355, Apr. 1992.

Citation:
Joel L. Wolf, John Turek, Ming-Syan Chen, Philip S. Yu, "A Hierarchical Approach to Parallel Multiquery Scheduling," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 6, pp. 578-590, June 1995, doi:10.1109/71.388035
Usage of this product signifies your acceptance of the Terms of Use.