This Article 
 Bibliographic References 
 Add to: 
Deterministic Voting in Distributed Systems Using Error-Correcting Codes
August 1998 (vol. 9 no. 8)
pp. 813-824

Abstract—Distributed voting is an important problem in reliable computing. In an N Modular Redundant (NMR) system, the N computational modules execute identical tasks and they need to periodically vote on their current states. In this paper, we propose a deterministic majority voting algorithm for NMR systems. Our voting algorithm uses error-correcting codes to drastically reduce the average case communication complexity. In particular, we show that the efficiency of our voting algorithm can be improved by choosing the parameters of the error-correcting code to match the probability of the computational faults. For example, consider an NMR system with 31 modules, each with a state of m bits, where each module has an independent computational error probability of 10−3. In this NMR system, our algorithm can reduce the average case communication complexity to approximately 1.0825m compared with the communication complexity of 31m of the naive algorithm in which every module broadcasts its local result to all other modules. We have also implemented the voting algorithm over a network of workstations. The experimental performance results match well the theoretical predictions.

[1] K.A.S. Abdel-Ghaffar and A.E. Abbadi, “An Optimal Strategy for Comparing File Copies,” IEEE Trans. Parallel and Distributed Systems, vol. 5, pp. 87-93, Jan. 1994.
[2] M. Blaum, J. Brady, J. Bruck, and J. Menon, “EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures,” IEEE Trans. Computers, vol. 44, no. 2, pp. 192–202, Feb. 1995.
[3] D.M. Blough and G.F. Sullivan, "Voting Using Predispositions," IEEE Trans. on Reliability, vol. 43, no. 4, pp. 604-616, 1994.
[4] K. Echtle, "Fault-Masking with Reduced Redundant Communication," Proc. 16th Ann. Int'l Symp. Fault-Tolerant Computing Systems, vol.16, pp. 178-183, 1986.
[5] D.D.E. Long and J.-F. Pâris, "Voting Without Version Numbers," Proc. Int'l Conf. Performance, Computing, and Comm., pp. 139-145, Feb. 1997.
[6] F.J. MacWilliams and N.J.A. Sloane, The Theory of Error Correcting Codes.Amsterdam: North-Holland, 1977
[7] J.F. Nebus, "Parallel Data Compression for Fault Tolerance," Computer Design, pp. 127-134, Apr. 1983.
[8] G. Noubir and H.J. Nussbaumer, "Using Error Control Codes to Reduce the Communication Complexity of Voting in NMR Systems," technical report, Dept. of Computer Science, Swiss Federal Inst. of Technology in Lausanne (EPFL), 1995.
[9] B. Parhami, "Voting Algorithms," IEEE Trans. Reliability, vol. 43, no. 4, pp. 617-629, 1994.
[10] L. Xu and J. Bruck, “X-Code: MDS Array Codes with Optimal Encoding,” IEEE Trans. Information Theory, vol. 45, no. 1, pp. 272–276, Jan. 1999.
[11] L. Xu, V. Bohossian, J. Bruck, and D. Wagner, “Low Density MDS Codes and Factors of Complete Graphs,” Proc. IEEE Symp. Information Theory, 1998.
[12] A. Ziv and J. Bruck, "Checkpointing in Parallel and Distributed Systems," Parallel and Distributed Computing Handbook, pp. 274-302. McGraw-Hill, 1996.

Index Terms:
NMR system, communication complexity, majority voting, error-correcting codes, MDS code.
Lihao Xu, Jehoshua Bruck, "Deterministic Voting in Distributed Systems Using Error-Correcting Codes," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 8, pp. 813-824, Aug. 1998, doi:10.1109/71.706052
Usage of this product signifies your acceptance of the Terms of Use.