International Parallel and Distributed Processing Symposium (IPDPS'03) Optimizing Synchronization Operations for Remote Memory Communication Systems Nice, France April 22-April 26 ISBN: 0-7695-1926-1
Synchronization operations, such as fence and locking, are used in many parallel operations accessing shared memory. However, a process which is blocked waiting for a fence operation to complete, or for a lock to be acquired, cannot perform useful computation. It is therefore critical that these operations be implemented as efficiently as possible to reduce the time a process waits idle. These operations also impact the scalability of the overall system. As system sizes get larger, the number of processes potentially requesting a lock increases. In this paper we describe the design and implementation of an optimized operation which combines a global fence operation and a barrier synchronization operation. We also describe our implementation of an optimized lock algorithm. The optimizations have been incorporated into the ARMCI communication library. The global fence and barrier operation gives a factor of improvement of up to 9 over the current implementation in a 16 node system, while the optimized lock implementation gives up to 1.25 factor of improvement. These optimizations allow for more efficient and scalable applications.
Citation:
Darius Buntinas, Amina Saify, Dhabaleswar K. Panda, Jarek Nieplocha, "Optimizing Synchronization Operations for Remote Memory Communication Systems," ipdps, pp.199a, International Parallel and Distributed Processing Symposium (IPDPS'03), 2003 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||