This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System
Phoenix, Arizona
November 15-November 21
ISBN: 1-58113-695-1
Terry Jones, Lawrence Livermore National Laboratory, Livermore, CA
Shawn Dawson, Lawrence Livermore National Laboratory, Livermore, CA
Rob Neely, Lawrence Livermore National Laboratory, Livermore, CA
William Tuel, International Business Machines Corporation, Armonk, NY
Larry Brenner, International Business Machines Corporation, Armonk, NY
Jeffrey Fier, International Business Machines Corporation, Armonk, NY
Robert Blackmore, International Business Machines Corporation, Armonk, NY
Patrick Caffrey, International Business Machines Corporation, Armonk, NY
Brian Maskell, Atomic Weapons Establishment, Aldermaston Reading, UK
Paul Tomlinson, Atomic Weapons Establishment, Aldermaston Reading, UK
Mark Roberts, Atomic Weapons Establishment, Aldermaston Reading, UK
A parallel application benefits from scheduling policies that include a global perspective of the application's process working set. As the interactions among cooperating processes increase, mechanisms to ameliorate waiting within one or more of the processes become more important. In particular, collective operations such as barriers and reductions are extremely sensitive to even usually harmless events such as context switches among members of the process working set. For the last 18 months, we have been researching the impact of random short-lived interruptions such as timer-decrement processing and periodic daemon activity, and developing strategies to minimize their impact on large processor-count SPMD bulk-synchronous programming styles. We present a novel co-scheduling scheme for improving performance of fine-grain collective activities such as barriers and reductions, describe an implementation consisting of operating system kernel modifications and run-time system, and present a set of empirical results comparing the technique with traditional operating system scheduling. Our results indicate a speedup of over 300% on synchronizing collectives.
Citation:
Terry Jones, Shawn Dawson, Rob Neely, William Tuel, Larry Brenner, Jeffrey Fier, Robert Blackmore, Patrick Caffrey, Brian Maskell, Paul Tomlinson, Mark Roberts, "Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System," sc, pp.10, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, 2003
Usage of this product signifies your acceptance of the Terms of Use.