This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Design of Multi-Invariant Data Structures for Robust Shared Accesses in Multiprocessor Systems
March 2001 (vol. 27 no. 3)
pp. 193-207

Abstract—Multiprocessor systems are widely used in many application programs to enhance system reliability and performance. However, reliability does not come naturally with multiple processors. We develop a multi-invariant data structure approach to ensure efficient and robust access to shared data structures in multiprocessor systems. Essentially, the data structure is designed to satisfy two invariants, a strong invariant, and a weak invariant. The system operates at its peak performance when the strong invariant is true. The system will operate correctly even when only the weak invariant is true, though perhaps at a lower performance level. The design ensures that the weak invariant will always be true in spite of fail-stop processor failures during the execution. By allowing the system to converge to a state satisfying only the weak invariant, the overhead for incorporating fault tolerance can be reduced. In this paper, we present the basic idea of multi-invariant data structures. We also develop design rules that systematically convert fault-intolerant data abstractions into corresponding fault-tolerant versions. In this transformation, we augment the data structure and access algorithms to ensure that the system always converges to the weak invariant even in the presence of fail-stop processor failures. We also design methods for the detection of integrity violations and for restoring the strong invariant. Two data structures, namely, binary search tree and double-linked list, are used to illustrate the concept of multi-invariant data structures.

[1] N.M. Amato and M.C. Loui, “Checking Linked Data Structures,” Proc. 24th Ann. Int'l Symp. Fault-Tolerant Computing (FTCS), 1994.
[2] R.B. Anderson, Proving Programs Correct. John Wiley&Sons, 1979.
[3] A. Arora and M. Gouda:, “Distributed reset,” Lecture Notes inComputer Science 472: Foundations of Software Technology and TheoreticalComputer Science, K.V. Nori and C.E. Veni Madhavan, eds., pp. 316-331, Springer-Verlag, 1990.
[4] A. Arora and S. Kulkarni, “Component-Based Design of Multitolerant Systems,” IEEE Trans. Software Eng., vol. 24, no. 1, pp. 63–78, Jan. 1998.
[5] A. Arora and S.S. Kulkarni, "Designing Masking Fault-Tolerance via Nonmasking Fault-Tolerance," IEEE Trans. Software Eng., Vol. 24, No. 6, 1998, pp. 435-450.
[6] F.B. Bastani and I.-L. Yen, “A Fault-Tolerant Replicated Storage System,” Proc. Data Eng., Feb. 1987.
[7] N.S. Bowen and D.K. Pradhan, “Processor- and Memory-Based Checkpoint and Rollback Recovery,” Computer, vol. 26, no. 2, pp. 22-31, Feb. 1993.
[8] F. Cristian, "Understanding Fault-Tolerant Distributed Systems," Comm. ACM, vol. 34, no. 2, Feb. 1991.
[9] I.J. Davis, “Local Correction of Helix(k) Lists,” IEEE Trans. Computers, pp. 718-724, May 1989.
[10] E.W. Dijkstra,“Self-stabilizing systems in spite of distributed control,” Comm. ACM, vol. 17, no. 11 pp. 643-644, 1974,.
[11] S. Dolev and T. Herman, “SuperStabilizing for Dynamic Distributed Systems,” Proc. Second Workshop Self-Stabilizing Systems, 1995.
[12] R.M. Dubash and F.B. Bastani, “A Hybrid Architecture for Mobile Robots Based on Decentralized Parallel Path Planning,” Proc. Int'l Symp. Autonomous Decentralized Systems (ISADS '93) pp. 206-214, Mar. 1993.
[13] Y.M. Wang and W.K. Fuchs, “Lazy Checkpoint Coordination for Bounding Rollback Propagation,” Proc. 12th Symp. Reliable Distributed Systems, pp. 78-85, 1993.
[14] M.G. Gouda and N. Multari, “Stabilizing Communication Protocols,” IEEE Trans. Computers, vol. 40, no. 4, pp. 448-458, Apr. 1991.
[15] J.N. Gray, "Notes on Database Operating Systems" Operating Systems: An Advanced Course, R. Bayer, R.M. Graham, and G. Seegmuller, eds., Lecture Notes in Computer Science 60, Springer-Verlag, Heidelberg, Germany, 1978.
[16] C.B. Jones, “Tentative Steps Towards a Development for Interfering Programs,” ACM Trans. Programming Languages and Systems, vol. 5, no. 4, pp. 596-619, 1983.
[17] R.K. Jurgen, “Smart Cars and Highways go Global,” IEEE Spectrum, vol. 28, no. 5, pp. 26-36, May 1991.
[18] K.H. Kim and C. Subbaraman, “Fault-Tolerant Real-Time Objects,” Comm. ACM, pp. 75-82, Jan. 1997.
[19] H. Kopetz and G. Grünsteidl, "TTP: A Time-Triggered Protocol for Fault-Tolerant Real-Time Systems," Computer, vol. 24, no. 1, Jan. 1994, pp. 14-23.
[20] B. Lampson, "Atomic Transactions," Lecture notes in Computer Science—Distributed Systems: Architecture and Implementation, vol. 105, pp. 246-265. Springer-Verlag, 1981.
[21] E.K. Lee and R.H. Katz, “The Performance of Parity Placements in Disk Arrays,” IEEE Trans. Computers, vol. 42, no. 6, pp. 651-664, June 1993.
[22] C. Le Pape, “A Combination of Centralized and Distributed Methods for Multi-Agent Planning and Scheduling,” Proc. IEEE Int'l Conf. Robotics and Automation, pp. 488-493, 1990.
[23] C.-C.J. Li, P.P. Chen, and W.K. Fuchs, “Local Concurrent Error Detection and Correction in Data Structures Using Virtual Backpointers,” IEEE Trans. Computers, pp. 1481-1492, Nov. 1989.
[24] H. Madeira and J. Silva, “Experimental Evaluation of the Fail-Silent Behavior in Computers without Error Masking,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 350–359, 1994.
[25] Z. Manna, S. Ness, and J. Vuillemin, “Inductive Methods for Proving Properties of Programs,” Comm. ACM, vol. 16, no. 8, pp. 491-502, Aug. 1973.
[26] Z. Manna, Mathematical Theory of Computation. New York: McGraw-Hill, 1974.
[27] P. Mazumder, “Design of a Fault-Tolerant Three-Dimensional Dynamic Random-Access Memory with On-Chip Error-Correcting Circuit,” IEEE Trans. Computers, vol. 42, no. 12, pp. 1453-1468, Dec. 1993.
[28] D. Skeen, "Non-Blocking Commit Protocols," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, New York, 1981, pp. 133-142.
[29] D.J. Taylor, D.E. Morgan, and J.P. Black, “Redundancy in Data Structures: Improving Software Fault Tolerance,” IEEE Trans. Software Eng., vol. 6, no. 6, pp. 585-594, Nov. 1980.
[30] D.J. Taylor, D.E. Morgan, and J.P. Black, “Redundancy in Data Structures: Improving Software Fault Tolerance,” IEEE Trans. Software Eng., vol. 6, no. 6, pp. 595-602, Nov. 1980.
[31] D.J. Taylor and C.-J.H. Seger, “Robust Storage Structures for Crash Recovery,” IEEE Trans. Computers, vol. C-35, no. 4, pp. 288-295, Apr. 1986.
[32] D.J. Taylor and J.P. Black, “A Locally Correctable B-Tree Implementation,” Computer J., vol. 29, no. 3, pp. 269-276, June 1986.
[33] W. Weihl and B. Liskov, “Implementation of Resilient, Atomic Data Types,” ACM Trans. Programming Languages and Systems, vol. 7, no. 2, pp. 244-269, Apr. 1985.
[34] I.-L. Yen and F.B. Bastani, “Systematic Incorporation of Efficient Fault Tolerance in Systems of Cooperating Parallel Programs,” Proc. 24th Int'l Symp. Fault-Tolerant Computing (FTCS '94), pp. 154-163, June 1994.

Index Terms:
Robust data structures, atomic transaction processing, fault-tolerant systems, real-time processing.
Citation:
I-Ling Yen, Farokh B. Bastani, David J. Taylor, "Design of Multi-Invariant Data Structures for Robust Shared Accesses in Multiprocessor Systems," IEEE Transactions on Software Engineering, vol. 27, no. 3, pp. 193-207, March 2001, doi:10.1109/32.910857
Usage of this product signifies your acceptance of the Terms of Use.