18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Papers
Architecture of LA-MPI, A Network-Fault-Tolerant MPI
Santa Fe, New Mexico
April 26-April 30
ISBN: 0-7695-2132-0
We discuss the unique architectural elements of the Los Alamos Message Passing Interface (LA-MPI), a high-performance, network-fault-tolerant, thread-safe MPI library. LA-MPI is designed for use on terascale clusters which are inherently unreliable due to their sheer number of system components and tradeoffs between cost and performance. We examine in detail the design concepts used to implement LA-MPI. These include reliability features, such as application-level checksumming, message retransmission, and automatic message re-routing. Other key performance enhancing features, such as concurrent message routing over multiple, diverse network adapters and protocols, and communication-specific optimizations (e.g., shared memory) are examined.
Citation:
Rob T. Aulwes, David J. Daniel, Nehal N. Desai, Richard L. Graham, L. Dean Risinger, Mark A. Taylor, Timothy S. Woodall, Mitchel W. Sukalski, "Architecture of LA-MPI, A Network-Fault-Tolerant MPI," ipdps, vol. 1, pp.15b, 18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Papers, 2004
Usage of this product signifies your acceptance of the
Terms of Use.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||