This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Dynamic Coherence Protocol for Distributed Shared Memory Enforcing High Data Availability at Low Costs
September 1996 (vol. 7 no. 9)
pp. 915-930

Abstract—DSM coherence protocols should scale well for large networks. Fault-tolerance in terms of highly available data access and uninterrupted DSM service is needed in large-scale environments that have a greater number of potentially malfunctioning components. We present a new class of dynamic coherence protocols for DSM systems in error-prone networks whose instances offer highly available access to DSM data at low operation costs. The approach is based on the highly scalable Boundary-Restricted (BR) coherence protocol class. The new protocol class, called the Dynamic Boundary-Restricted (DBR) coherence protocol class, maintains read/write frequencies of DSM requests at run-time. This information is used to dynamically adjust the minimum number of cached copies of a single DSM page in order to guarantee a given degree of data availability. The description of the new protocol class is accompanied by an analysis covering a large variety of workloads. This analysis presents the overall savings achieved by using a DBR coherence protocol in comparison to a static BR protocol.

[1] N. Drach, A. Gefflaut, P. Joubert, and A. Seznec, "About Cache Associativity in Low-Cost Shared Memory Multi-Processors," Technical Report no. 2083, INRIAIRISA Rennes, Campus universitaire de Beaulieu, 35042 Rennes Cedex, France, Oct. 1993.
[2] M. Dubois and C. Scheurich, "Memory Access Dependencies in Shared-Memory Multiprocessors," IEEE Trans. Computers, vol. 16, no. 6, pp. 660-673, June 1990.
[3] E.N. Elnozahy, D.B. Johnson, and W. Zwaenepoel, "The Performance of Consistent Checkpointing," Proc. 11th Symp. Reliable Distributed Systems, pp. 86-95, Oct. 1992.
[4] M.J. Feeley, J.S. Chase, V.R. Narasayya, and H.M. Levy, "Integrating Coherency and Recoverability in Distributed Systems," Proc. First Symp. Operating Systems Design and Implementation (OSDI '94),Monterey, Calif., Nov. 1994.
[5] B.D. Fleisch, R.L. Hyde, and N.C. Juul, "Mirage+: A Kernel Implementation of Distr. Shared Memory on a Network of Personal Computers," Software—Practice&Experience, vol. 24, no. 10, pp. 887-910, Oct. 1994.
[6] B.D. Fleisch and G.J. Popek, "Mirage: A Coherent Distributed Shared Memory Design," Proc. 12th ACM Symp. Operating Systems Principles, published in Operating Systems Review, vol. 23, no. 5, Special Issue, pp. 211-223, Dec. 1989.
[7] J. Goodman, "Cache Consistency and Sequential Consistency," Technical Report 61, SCI Committee, Mar. 1989.
[8] B. Janssens and W.K. Fuchs, "Relaxing Consistency in Recoverable Distributed Shared Memory," Proc. 23rd Int'l Symp. Fault-Tolerant Computing, pp. 155-163, June 1994.
[9] N.C. Juul and B.D. Fleisch, "A Memory Approach to Consistent, Reliable Distributed Shared Memory," Proc. Fifth Symp. Hot Topics in Operation Systems, May 1995.
[10] R. Koo and S. Toueg, "Checkpointing and Rollback-Recovery for Distributed Systems," IEEE Trans. Software Eng., vol. 13, no. 1, pp. 23-31, Jan. 1987.
[11] L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," IEEE Trans. Computers, vol. 28, no. 9, pp. 690-691, Sept. 1979.
[12] D. Lenoski et al., "The directory-based cache coherence protocol for the dash multiprocessor," Proc. 17th Int'l Symp. Computer Architecture,Los Alamitos, Calif., pp. 148-159, 1990.
[13] V. Lo, "Operating Systems Implementations of Distributed Shared Memory," Advances in Computers, vol. 39, 1994.
[14] D.D. Long, A. Miur, and R. Golding, "A Longitudinal Survey of Internet Host Reliability," Technical Report UCSC-CRL-95-16, Dept. of Computer Science, Univ. of California, Santa Cruz, Mar. 1995.
[15] A. Mohindra and U. Ramachandran, "A Survey of Distributed Shared Memory in Loosely-Coupled Systems," Technical Report GIT-CC-9101, Georgia Inst. of Tech nology, Jan. 1991.
[16] N. Neves, M. Castro, and P. Guedes, "A Checkpoint Protocol for an Entry Consistent Shared Memory System," Proc. 13th ACM Symp. Principles of Distributed Computing, Aug. 1994.
[17] B. Nitzberg and V. Lo, "Distributed Shared Memory: A Survey of Issues and Algorithms," Computer, vol. 24, no. 8, Aug. 1991.
[18] J.S. Plank and K. Li, "ickp: A Consistent Checkpointer for Multicomputers," IEEE Parallel&Distributed Technology, vol. 2, no. 2, pp. 62-67, 1994.
[19] G.G. Richard III and M. Singhal, "Using Logging and Asynchronous Checkpointing to Implement Recoverable Distributed Shared Memory," Proc. 12th Symp. Reliable Distributed Systems, pp. 86-95,Princeton, N.J., Oct. 1993.
[20] M. Satyanarayanan, H.H. Mashburn, P. Kumar, D.C. Steere, and J.J. Kistler, "Lightweight Recoverable Virtual Memory," ACM Trans. Computer Systems, vol. 12, no. 1, pp. 33-57, Feb. 1994.
[21] A. Silberschatz and P.B. Galvin, Operating Systems Concepts, 5th ed., Addison-Wesley, Reading, Mass., 1998.
[22] M. Singhal and N.G. Shivaratri, Advanced Concepts in Operation Systems: Distributed, Database, and Multiprocessor Operating Systems. McGraw-Hill, 1994.
[23] M. Stumm and S. Zhou, "Fault Tolerant Distributed Shared Memory Algorithms," Proc. Second IEEE Symp. Parallel and Distributed Processing, pp. 719-724. Dec. 1990.
[24] O.E. Theel and B.D. Fleisch, "Design and Analysis of Highly Available and Scalable Coherence Protocols for Distributed Shared Memory Systems," Technical Report UCR-CS-95-1, Dept. of Computer Science, Univ. of California, Riverside, Apr. 1995.
[25] K.-L. Wu and W.K. Fuchs, "Recoverable Distributed Shared Virtual Memory," IEEE Trans. Computers, vol. 39, no. 4, pp. 460-469, Apr. 1990.

Index Terms:
Distributed systems, fault-tolerance, availability, distributed shared memory, dynamic coherence protocols, adaptability, stochastic modeling.
Citation:
Oliver E. Theel, Brett D. Fleisch, "A Dynamic Coherence Protocol for Distributed Shared Memory Enforcing High Data Availability at Low Costs," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 9, pp. 915-930, Sept. 1996, doi:10.1109/71.536936
Usage of this product signifies your acceptance of the Terms of Use.