This Article 
 Bibliographic References 
 Add to: 
Recoverable Distributed Shared Virtual Memory
April 1990 (vol. 39 no. 4)
pp. 460-469

The problem of rollback recovery in distributed shared virtual environments, in which the shared memory is implemented in software in a loosely coupled distributed multicomputer system, is examined. A user-transparent checkpointing recovery scheme and a new twin-page disk storage management technique are presented for implementing recoverable distributed shared virtual memory. The checkpointing scheme can be integrated with the memory coherence protocol for managing the shared virtual memory. The twin-page disk design allows checkpointing to proceed in an incremental fashion without an explicit undo at the time of recovery. The recoverable distributed shared virtual memory allows the system to restart computation from a checkpoint without a global restart.

[1] K. Li and P. Hudak, "Memory coherence in Shared Virtual Memory systems," inProc. 5th Ann. ACM Symp. on Princ. of Distr. Computing(Calgary, AB, Can.), Aug. 1986, pp. 229-239.
[2] K. Li, "IVY: A shared virtual memory system for parallel computing," inProc. 1988 Int. Conf. Parallel Processing, 1988, pp. 94-101.
[3] K. Li,Shared Virtual Memory on Loosely Coupled Multiprocessors, doctoral dissertation, Yale Univ., Sept. 1985.
[4] R. Bisiani, A. Nowatzyk, and M. Ravishankar, "Coherent shared memory on a distributed memory machine," inProc. 1989 Int. Conf. Parallel Processing, Vol. I Architecture, 1989, pp. I-133-I-141.
[5] U. Ramachandran, M. Ahamad, and M. Y. A. Khalidi, "Coherence of distributed shared memory: Unifying synchronization and data transfer," inProc. 1989 Int. Conf. Parallel Processing, Vol. II Software, 1989, pp. II-160-II-169.
[6] C. P. Thacker, L. C. Stewart, and E. H. Satterthwaite, Jr., "Firefly: A multiprocessor workstation,"IEEE Trans. Comput., vol. 37, pp. 909-920, Aug. 1988.
[7] Balance 8000 Technical Summary, Sequent Computer Systems, Inc., Nov. 1984.
[8] G. F. Pfister, W. C. Brantley,et al., "The IBM research parallel processor prototype (RP3): Introduction and architecture," inProc. 1985 Int. Conf. Parallel Processing, 1985, pp. 764-770.
[9] D. Gajski, D. Kuck, D. Lawrie, and A. Sameh, "Cedar--A large scale multiprocessor," inProc. 1983 Int. Conf. Parallel Processing, 1983, pp. 524-529.
[10] K. H. Kim, "Programmer-transparent coordination of recovering concurrent processes: Philosophy and rules for efficient implementation,"IEEE Trans. Software Eng., vol. 14, pp. 810-821, June 1988.
[11] Y.-H. Lee and K. G. Shin, "Design and evaluation of a fault-tolerant multiprocessor using hardware recovery blocks,"IEEE Trans. Comput., vol. C-33, pp. 113-124, Feb. 1984.
[12] J. Kent and H. Garcia-Molina, "Optimizing shadow recovery algorithms,"IEEE Trans. Software Eng., vol. 14, pp. 155-168, Feb. 1988.
[13] R. A. Lorie, "Physical integrity in a large segmented database,"ACM Trans. Database Syst., vol. 2, pp. 91-104, Mar. 1977.
[14] A. Reuter, "A fast transaction-oriented logging scheme for UNDO recovery,"IEEE Trans. Software Eng., vol. SE-6, pp. 348-356, July 1980.
[15] S. M. Thatte, "Persistent memory: A storage architecture for object-oriented database systems," inProc. 1986 Int. Workshop Object-Oriented Database Syst., Sept. 1986, pp. 148-159.
[16] R. D. Schlichting and F.B. Schneider, "Fail-stop processors: An approach to designing fault-tolerant computing systems,"ACM Trans. Comput. Syst., vol. 1, no. 3, pp. 222-238, Aug. 1983.
[17] A. Chang and M. F. Mergen, "801 Storage: Architecture and programming,"ACM Trans. Comput. Syst., vol. 6, no. 1, pp. 28-50, Feb. 1988.
[18] A. Agarwal and A. Gupta, "Memory-reference characteristics of multiprocessor applications under MACH," inProc. ACM SIGMETRICS Conf Measurement and Modeling of Computer Systems, 1988, pp. 215-226.
[19] S.J. Eggers and R.H. Katz, "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation,"Proc. 15th Int'l Symp. Computer Architecture, 1988, IEEE CS Press, Los Alamitos, Calif. Order No. 861, pp. 373-382.
[20] F. Darema-Rogers, G. F. Pfister, and K. So, "Memory access patterns of parallel scientific programs," inProc. 1988 ACM Sigmetrics Conf. Measurement and Modeling of Comput. Syst., May 1987, pp. 46-58.

Index Terms:
virtual memory; rollback recovery; distributed shared virtual environments; loosely coupled distributed multicomputer system; user-transparent checkpointing recovery scheme; twin-page disk storage management technique; memory coherence protocol; distributed processing; storage management; virtual storage.
K.-L. Wu, W.K. Fuchs, "Recoverable Distributed Shared Virtual Memory," IEEE Transactions on Computers, vol. 39, no. 4, pp. 460-469, April 1990, doi:10.1109/12.54839
Usage of this product signifies your acceptance of the Terms of Use.