19th IEEE International Conference on Distributed Computing Systems (ICDCS'99)
Reducing Message Overhead in TMR Systems
Austin, Texas
May 31-June 04
ISBN: 0-7695-0222-9
Traditional TMR protocols assume either single, reliable voters for each Triple-Modular Redundant Unit (TMRU) or triplicated voters (one for each processor) for each TMRU. In the first case a voter is a single point of failure for the system. In the second case, many physical messages must be sent across the communication network for each logical data item.We examine some protocols which attempt to maintain the functionality of the triplicated voter TMR protocol, while reducing the number of physical messages required by one third. Possible solutions are examined to the many issues that result from this reduction in communication. Three different Reduced-communication Triple-Modular Redundant (RTMR) protocols are considered, each of which makes different assumptions about the nature of the underlying computation.
Index Terms:
triple modular redundancy, fault masking, reduced communication overhead, voting, distributed systems
Citation:
John C. Ramirez, Rami G. Melhem, "Reducing Message Overhead in TMR Systems," icdcs, pp.0045, 19th IEEE International Conference on Distributed Computing Systems (ICDCS'99), 1999