This Article 
 Bibliographic References 
 Add to: 
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
April 1994 (vol. 5 no. 4)
pp. 379-400

Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. The authors consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. They show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. They propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. They compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. The authors conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.

[1] T. E. Anderson, E. D. Lazowska, and H. M. Levy, "The performance implications of thread management alternatives for shared memory multiprocessors,"IEEE Trans. Comput., vol. 38, pp. 1631-1644, Dec. 1989.
[2] BBN Advanced Computers Inc.,Inside the TC2000TMComputer. Cambridge, MA, 1990.
[3] B. N. Bershad, E. D. Lazowska, H. M. Levy, and D. B. Wagner, "An open environment for building parallel programming systems," inProc. ACM/SIGPLAN PPEALS 1988-Parallel Programming: Experience with Applications, Languages Syst., New Haven, CT, July 19-21, 1988, pp. 1-9. Also appeared as SIGPLAN Notices, vol. 23, no. 9, Sept. 1988.
[4] S. H. Bokhari,Assignment Problems in Parallel and Distributed Computing. Boston, MA: Kluwer Academic, 1987.
[5] W. Bolosky, M. Scott, R. Fitzgerald, R. Fowler, and A. Cox, "NUMA policies and their relationship to memory architecture," inProc. Architectural Support for Programming Languages and Oper. Syst., Apr. 1991, pp. 212-221.
[6] W. Bolosky, M. Scott, and R. Fitzgerald, "Simple but effective techniques for NUMA memory management," inProc. Twelfth ACM Symp. Oper. Syst. Principles, Dec. 1989, pp. 19-31.
[7] A. L. Cox and R. J. Fowler, "The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with Platinum," inProc. Twelfth ACM Symp. Oper. Syst. Principles, Dec. 1989, pp. 32-43.
[8] M. Crovella, P. Das, C. Dubnicki, T. LeBlanc, and E. Markatos, "Multiprogramming on multiprocessors,"Proc. 3rd IEEE Symp. Parallel Distrib. Processing, 1991, pp. 590-597.
[9] S. Dandamudi, "A comparison of task scheduling strategies for multiprocessor systems,"Proc. 3rd IEEE Symp. Parallel Distrib. Processing, 1991, pp. 423-426.
[10] T. W. Doeppner, "Threads: A system for the support of concurrent programming," Tech. Rep. CS-87-11, Dept. of Comput. Sci., Brown Univ., 1987.
[11] D. L. Eager and J. Zahorjan, "Adaptive guided self-scheduling," Tech. Rep. 92-01-01, Dept. of Comput. Sci. and Eng., Univ. of Wash., 1992.
[12] A. Gupta, A. Tucker, and S. Urushibara, "The impact of operating system scheduling policies and synchronization methods on the performance of parallel applications,"Proc. 1991 ACM SIGMETRICS Conf. Measurement and Modeling of Comput. Syst., 1991, pp. 120-132.
[13] R. Gupta, "Synchronization and communication costs of loop partitioning on shared-memory multiprocessor systems,"1989 Int. Conf. Parallel Processing, vol. II, 1989, pp. 23-30.
[14] J.L. Hennessy and David A. Patterson,Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.
[15] S. F. Hummel, E. Schonberg, and L. E. Flynn, "Factoring: A practical and robust method for scheduling parallel loops,"Commun. ACM, vol. 35, pp. 90-101, Aug. 1992.
[16] C. Kruskal and A. Weiss, "Allocating independence subtasks on parallel processors,"IEEE Trans. Software Eng., vol. SE-11, no. 10, pp. 1001-1015, Oct. 1985.
[17] R.P. LaRowe, Jr., and C.S. Ellis, "Experimental Comparison of Memory Management Policies for NUMA Multiprocessors,"ACM Trans. Computer Systems, Vol. 9, No. 4, Nov. 1991, pp. 319-363.
[18] T.G. Lewis and H. El-Rewini,Introduction to Parallel Computing, Prentice-Hall, New York, 1992.
[19] S. Lucco, "A dynamic scheduling method for irregular parallel programs,"ACM SIGPLAN'92 Conf. Programming Language Design and Implementation, 1992, pp. 200-211.
[20] E. P. Markatos and T. J. LeBlanc, "Load balancing versus locality management in shared-memory multiprocessors,"Proc. 1992 Int. Conf. Parallel Processing, vol. I, 1992, pp. 258-267.
[21] E. P. Markatos and T. J. LeBlanc, "Shared-memory multiprocessor trends and the implications for parallel program performance," Tech. Rep. 420, Univ. of Rochester, Dept. of Comput. Sci., 1992.
[22] C. McCann, R. Vaswani, and J. Zahorjan, "A dynamic processor allocation policy for multiprogrammed shared memory multiprocessors,"ACM Trans. Comput. Syst., vol. 11, pp. 146-178, May 1993.
[23] C. Polychronopoulos,Parallel Programming and Compilers, Kluwer Academic Publishers, 1988.
[24] C. Polychronopoulos and D. Kuck, "Guided self-scheduling: A practical scheduling scheme for parallel supercomputers,"IEEE Tran. Comput., 1987.
[25] B. Smith, "Architecture and applications of the HEP computer system,"Proc. SPIE, Real-Time Signal Processing IV, 1981.
[26] M. S. Squilante and E. D. Lazowska, "Using processor-cache affinity information in shared-memory multiprocessor scheduling,"IEEE Trans. Parallel Distrib. Syst., vol. 4, pp. 131-143, Feb. 1993.
[27] M. S. Squillante and R. D. Nelson, "Analysis of task migration in shared-memory multiprocessor scheduling," inProc. ACM SIGMETRICS Conf. Measurement and Modeling of Comput. Syst., May 1991, pp. 143-155.
[28] P. Tang and P.-C. Yew, "Processor self-scheduling for multiple nested parallel loops,"Proc. 1986 Int. Conf. Parallel Processing, 1986, pp. 528-535.
[29] R. H. Thomas and W. Crowther, "The uniform system: An approach to runtime support for large scale shared memory parallel processors,"Proc. 1988 Int. Conf. Parallel Processing, 1988, pp. 245-254.
[30] A. Tucker and A. Gupta, "Process control and scheduling issues on a network of multiprocessors," inProc. Twelfth ACM Symp. Operating System Principles, Litchfielf Park, AZ, Dec. 1989, pp. 159-166.
[31] T. H. Tzen and L. M. Ni, "Trapezoid self-scheduling: A practical scheduling scheme for parallel computers,"IEEE Trans. Parallel Distrib. Syst., vol. 4, pp. 87-98, Jan. 1993.
[32] R. Vaswani and J. Zahorjan, "The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors,"Proc. 13th Symp. Operating Syst. Principles, 1991, pp. 26-40.

Index Terms:
Index Termsshared memory systems; scheduling; performance evaluation; loop scheduling; processoraffinity; shared-memory multiprocessors; loop iterations; communication overhead;iterations; kernel programs; Silicon Graphics multiprocessor; BBN Butterfly; SequentSymmetry; KSR-1; performance improvements; synchronization; load imbalance
E.P. Markatos, T.J. LeBlanc, "Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 4, pp. 379-400, April 1994, doi:10.1109/71.273046
Usage of this product signifies your acceptance of the Terms of Use.