This Article 
 Bibliographic References 
 Add to: 
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems
January 1995 (vol. 6 no. 1)
pp. 1-16

Abstract—There are two basic ways in which waiting ready tasks can be organized: centralized organization or distributed organization. In the centralized organization, a single central task queue is maintained. In the latter case, each processor has its own private ready queue of tasks. Ideally, a central ready queue global to all processors is desired over the distributed organization because the centralized organization provides perfect load sharing. However, the centralized organization is not suitable for large parallel systems because the central task queue could become a system bottleneck. The distributed organization, on the other hand, creates the load imbalance problem, which results in performance deterioration. While techniques have been proposed to reduce the ill-effects of task queue contention in the centralized organization and load imbalance in the distributed organization these techniques introduce problems of their own. In this paper we propose the use of a hierarchical task queue organization to incorporate the best features of these two organizations. Our study into the performance of this hierarchical organization shows that a properly designed hierarchical organization provides performance very close to that of the centralized organization while eliminating the ready queue contention problem. We also provide an analysis that identifies and provides guidance for designing the hierarchical task queue organization that avoids ready queue access contention. A brief discussion of task scheduling policies is also included.

Index Terms—Hierarchical organization, multiprocessor systems, parallel systems, performance evaluation, process scheduling, scheduling overhead.

[1] T. E. Anderson,“The performance of spin lock alternatives for shared-memory multiprocessor,”IEEE Trans. Parallel Distrib. Syst.,vol. 1, pp. 6–16, Jan. 1990.
[2] T. E. Anderson, E. D. Lazowska, and H. M. Levy,“The performance implications of thread management alternatives for shared-memory multiprocessors,”IEEE Trans. Comput., vol. C-38, no. 12, pp. 1631–1644, Dec. 1989.
[3] D. L. Black,“Scheduling support for concurrency and parallelism in the Mach operating system,”Computer, pp. 35–43, May 1990.
[4] S. P. Cheng and S. P. Dandamudi,“Scheduling in parallel systems with a hierarchical organization of tasks,”inProc. ACM Int. Conf. Supercomputing, Washington, DC, July 1992, pp. 377–386.
[5] S. P. Dandamudi,“A comparison of task scheduling strategies for multiprocessor systems,”inIEEE Symp. Parallel and Distrib. Processing, Dallas, TX, Dec. 1991, pp. 423–426.
[6] ——,“Performance implications of task routing and task scheduling strategies for multiprocessor systems,”inIEEE Conf. Massively Parallel Comput. Syst., Ischia, Italy, May 1994, pp. 348–353.
[7] A. Duda,“On the tradeoff between parallelism and communication,”Modelling Techniques and Tools for Computer Performance Evaluation, R. Puigjaner and D. Poteir (Eds.), New York: Plenum, 1988.
[8] D. L. Eager, J. Zahorjan, and E. D. Lazowska,“Speedup versus efficiency in parallel systems,”IEEE Trans. Comput., vol. 38, pp. 408–423, Mar. 1989.
[9] D. G. Feitelson and L. Rudolph,“Distributed hierarchical control for parallel processing,”Computer, pp. 65–77, May 1990.
[10] D. Gajski,et al.,“Cedar—A Large Scale Multiprocessor,”inProc. Int. Conf. Parallel Processing, Aug. 1983, pp. 524–529.
[11] P. Jones and A. Murta,“Practical experience of run-time link reconfigurationin a multi-transputer machine,”Concurrency—Practice and Experience, vol. 1, no. 2, pp. 171–189, Dec. 1989.
[12] C.P. Kruskal and A. Weiss, "Allocating Independent Subtasks on Parallel Processors," IEEE Trans. Software Eng., vol. 11, no. 10, pp. 1,001-1,016, Oct. 1985.
[13] D. Lenoskiet al.,“The Standford Dash Multiprocessor,”IEEE Computer, pp. 63–79, Mar. 1992.
[14] S. T. Leutenegger and M. K. Vernon,“The performance of multiprogrammed multiprocessor scheduling policies,”inProc. ACM Sigmetrics Conf., Boulder, CO, 1990, pp. 226–236.
[15] T. Lovett and S. Thakkar,“The symmetry multiprocessor system,”inProc. 1988 Int. Conf. on Parallel Processing, 1988, vol. I, pp. 303–310.
[16] S. Majumdar, D. L. Eager, and R. B. Bunt,“Scheduling in multiprogrammed parallel systems,”inProc. ACM Sigmetrics Conf., Santa Fe, NM, 1988, pp. 104–113.
[17] J. M. Mellor-Crummey and M. L. Scott,“Algorithms for scalable synchronization on shared-memory multiprocessors,”ACM Trans. Comput. Syst., vol, 9, no. 1, pp. 21–65, Feb. 1991.
[18] R. Nelson and M. Squillante,“Analysis of contention in multiprocessor scheduling,”inPerform. 90—Proc. Int. Symp. Comput. Syst. Modelling, Measure. and Eval., Sep. 1990, pp. 391–405.
[19] R. Nelson, D. Towsley, and A. N. Tantawi,“Performance analysis of parallel processing systems,”IEEE Trans. Software Eng.vol. SE-14, no. 4, pp. 532–540, Apr. 1988.
[20] L. M. Ni and C. E. Wu,“Design tradeoffs for process scheduling in shared memory multiprocessor systems,”IEEE Trans. Software Eng.vol. SE-15, no. 3, pp. 327–334, Mar. 1989.
[21] C.D. Polychronopoulos and D.J. Kuck, “Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers,” IEEE Trans. Computers, vol. 36, no. 12, pp. 1425-1439, Dec. 1987.
[22] K. C. Sevcik,“Characterizations of parallelism in applications and their use in scheduling,”inProc. ACM Sigmetrics Conf., Berkeley, 1989, pp. 171–180.
[23] M. S. Squillante,“Issues in shared-memory multiprocessor scheduling: A performance evaluation,”Ph.D. dissertation, Dep. of Comput. Sci., Univ. of Washington, Seattle, Tech. Rep. 90-10-04, 1990.
[24] M. S. Squillante and E. D. Lazowska,“Using processor-cache affinity information in shared memory multiprocessor scheduling,”Dep. of Comput. Sci., Univ. of Washington, Tech. Rep. 89-06-01, 1989.
[25] P. Tang and P.-C. Yew,“Software combining algorithms for distributing hot-spot addressing,”J. Parallel and Distrib. Comput., vol. 10, no. 2, pp. 130–139, Oct. 1990.
[26] D. Towsley, C. G. Rommel, and J. A. Stankovic,“Analysis of fork-join program response times on multiprocessors,”IEEE Trans. Parallel and Distrib. Syst., vol. 1, no. 3, pp. 286–303, Jul. 1990.
[27] R. Vaswani and J. Zahorjan,“The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors,”inACM Symp. Operat. Syst. Princip., Pacific Grove, 1991, pp. 26–40.
[28] P. C. Yew, N. F. Tzeng, and D. H. Lawrie,“Distributing hot-spot addressing in large-scale multiprocessors,”IEEE Trans. Comput., vol. C-36, pp. 388–395, Apr. 1987.
[29] J. Zahorjan and C. McCann,“Processor scheduling in shared memory multiprocessors,”inProc. 1990 ACM SIGM Conf. Meas., Model., Comput., Syst., May 1990, pp. 214–225.
[30] S. Zhou and T. Brecht,“Processor pool-based scheduling for large-scale NUMA multiprocessors,”inProc. ACM Sigmetrics Conf., 1991, pp. 133–142.

Sivarama P. Dandamudi, Philip S. P. Cheng, "A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 1, pp. 1-16, Jan. 1995, doi:10.1109/71.363415
Usage of this product signifies your acceptance of the Terms of Use.