This Article 
 Bibliographic References 
 Add to: 
A Fault Tolerant Hybrid Memory Structure and Memory Management Algorithms
March 1995 (vol. 44 no. 3)
pp. 408-418

Abstract—This paper proposes a cost effective fault tolerant memory structure. It uses the modified status of virtual memory pages as the basis to propose a system with two classes of memory. One class is for modified pages, and the other is for pages not modified. The term hybrid memory system is used to describe this system. Results show the cost savings for a hybrid system over a traditional fault tolerant system. Hybrid virtual memory algorithms are proposed for the system. The traditional lifetime and space-time measures of virtual memory algorithms are extended for the hybrid algorithms. This includes “cost-weighted” measures to reflect the fact that the two classes of memory may have different resource allocation constraints. A theoretical result is presented for the effect of combining the hybrid lifetime functions. Finally, a framework for developing hybrid algorithms is presented with experimental results illustrating the analysis. It is shown that the lifetime measure for the hybrid policies can show improvements over traditional algorithms.

[1] M. Banâtre and G. Muller,“Ensuring data security and integrity with a fast stable storage,” Fourth Int’l Conf. on Data Eng., Feb. 1988, pp. 285-293.
[2] M. Banâtre and P. Joubert,“Cache management in tightly coupled fault tolerant multiprocessor,” 20th Symp. on Fault-Tolerant Computing, June 1990, pp. 89-96.
[3] L.A. Belady and C.J. Kuehner,“Dynamic space-sharing in computer systems,” Comm. ACM, vol. 12, pp. 282-288, May 1969.
[4] P.A. Bernstein,"Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing," Computer, pp. 37-45, Feb. 1988.
[5] N.S. Bowen and D.K. Pradhan,“Virtual checkpoints: Architecture and performance,” IEEE Trans. on Computers, vol. 41, pp. 516-526, May 1992.
[6] P.J. Denning,“The working set model for program behavior,” Comm. ACM, vol. 11, pp. 323-333, May 1968.
[7] P.J. Denning,“Working sets past and present,” IEEE Trans on Software Eng., vol. 6, pp. 64-84, Jan. 1980.
[8] D. Ferrari and Y.-Y. Yih,“Vsws: The variable-interval sampled working set policy,” IEEE Trans. Software Eng., vol. 9, pp. 299-305, May 1983.
[9] J. Gait,“A checkpointing page store for write-once optical disk,” IEEE Trans. Computers, vol. 39, pp. 2-9, Jan. 1990.
[10] G.S. Graham and P.J. Denning,“On the relative controllability of memory policies,” Proc. Int’l Symp. Computer Performance Modeling, Measurement, and Evaluation, K. Chandy and M. Reiser, eds., Aug. 1977, pp. 411-428, IBM T.J. Watson Research Center.
[11] A.L. Hopkins,T.B. Smith,, and J.H. Lala,“Ftmp A highly reliable fault-tolerant multiprocessor for aircraft,” Proc. IEEE, vol. 66, pp. 1221-1239, Oct. 1978.
[12] J. Hoskins,IBM ES/9000: A Business Perspective,New York: John Wiley&Sons, Inc., 1992.
[13] H.M. Levy and R.H. Eckhouse,Computer Programming and Architecture: The VAX, Digital Press, 1989.
[14] K. Li,J.F. Naughton,, and J.S. Plank,“Real-time, concurrent checkpoint for parallel programs,” Second ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP), SIGPLAN Notices, vol. 25, no. 3, pp. 79-88, Mar. 1990.
[15] D.P. Siewiorek,“Fault tolerance in commercial computers,” Computer, vol. 23, pp. 26-37, July 1990.
[16] D.P. Siewiorek,V. Kini,H. Mashburn,S. McConnel,, and M. Tsao,“A case study of C.mmp, Cm*, and C.vmp: Part I-Experiences with faulttolerance in multiprocessor systems,” Proc. IEEE, vol. 66, pp. 1178-1199, Oct. 1978.
[17] A.J. Smith,“A modified working set paging algorithm,” IEEE Trans. Computers, vol. 25, pp. 907-914, Sept. 1976.
[18] K. So and R.N. Rechtschaffen, "Cache Operations by MRU Change," IEEE Trans. Computers, vol. 37, no. 6, pp. 700-709, June 1988.
[19] M.E. Staknis,“Sheaved memory: Architectural support for state saving and restoration in paged systems,” Third Int’l Conf. Architectural Support for Programming Languages and Operating Systems, ACM, Apr. 1989, pp. 96-102.
[20] S.M. Thatte,“Persistent memory: A storage architecture for object-oriented database systems,” Proc. 1986 Int’l Workshop on Object-Oriented Database Systems, Sept. 1986., pp. 148-159.
[21] K.-L. Wu and W.K. Fuchs, "Recoverable Distributed Shared Virtual Memory," IEEE Trans. Computers, vol. 39, no. 4, pp. 460-469, Apr. 1990.

Index Terms:
Checkpoint and rollback recovery, fault tolerance, hybrid memory, memory management, virtual memory.
Dhiraj K. Pradhan, Nicholas S. Bowen, "A Fault Tolerant Hybrid Memory Structure and Memory Management Algorithms," IEEE Transactions on Computers, vol. 44, no. 3, pp. 408-418, March 1995, doi:10.1109/12.372033
Usage of this product signifies your acceptance of the Terms of Use.