International Parallel and Distributed Processing Symposium (IPDPS'03)
Performance Analysis of a Fault-Tolerant Distributed-Shared-Memory Protocol on the SOME-Bus Multiprocessor Architecture
Nice, France
April 22-April 26
ISBN: 0-7695-1926-1
Interconnection networks allowing multiple simultaneous broadcasts are becoming feasible, due to advances in fiber-optics and VLSI technology. DSM implementations on such networks promise high performance even for applications with small granularity. This paper summarizes the architecture of one such implementation and examines the performance of an augmented DSM protocol which provides fault tolerance by exploiting the natural DSM replication of data in order to maintain a recovery memory in each processing node. The additional data replication necessary to create fault-tolerant DSM causes no performance reduction and eliminates most of the checkpoint creation overhead. Data blocks duplicated to maintain the recovery memory may be utilized by the DSM protocol, reducing network traffic and increasing processor utilization significantly.
Citation:
Diana Hecht, Constantine Katsinis, "Performance Analysis of a Fault-Tolerant Distributed-Shared-Memory Protocol on the SOME-Bus Multiprocessor Architecture," ipdps, pp.213a, International Parallel and Distributed Processing Symposium (IPDPS'03), 2003