|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Christine Morin, Anne-Marie Kermarrec, Michel Banâtre, Alain Gefflaut, "An Efficient and Scalable Approach for Implementing Fault-Tolerant DSM Architectures," IEEE Transactions on Computers, vol. 49, no. 5, pp. 414-430, May, 2000. | |||
| BibTex | x | ||
| @article{ 10.1109/12.859537, author = {Christine Morin and Anne-Marie Kermarrec and Michel Banâtre and Alain Gefflaut}, title = {An Efficient and Scalable Approach for Implementing Fault-Tolerant DSM Architectures}, journal ={IEEE Transactions on Computers}, volume = {49}, number = {5}, issn = {0018-9340}, year = {2000}, pages = {414-430}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.859537}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - An Efficient and Scalable Approach for Implementing Fault-Tolerant DSM Architectures IS - 5 SN - 0018-9340 SP414 EP430 EPD - 414-430 A1 - Christine Morin, A1 - Anne-Marie Kermarrec, A1 - Michel Banâtre, A1 - Alain Gefflaut, PY - 2000 KW - Distributed shared memory KW - fault tolerance KW - coherence protocol KW - backward error recovery KW - scalability KW - performance KW - coma KW - svm. VL - 49 JA - IEEE Transactions on Computers ER - | |||
Abstract—Distributed Shared Memory (
[1] P.A. Lee and T. Anderson, Fault Tolerance: Principles and Practice, second revised ed. Springer Verlag, 1990.
[2] E. Hagersten, A. Landin, and S. Haridi,“DDM—A cache-only memory architecture,”IEEE Comput. Mag., vol. 25, pp. 44–54, Sept. 1992.
[3] J. Franck, H. Burkhard III, and J. Rothnie, “The KSR1: Bridging the Gap between Shared Memory and MMPs,” Proc. Compcon93 38th IEEE CS Int'l Conf., pp. 285-294, Feb. 1993.
[4] Intel Corporation, Paragon User's Guide, 1993.
[5] T.E. Anderson, D.E. Culler, and D.A. Patterson, “The Berkeley Networks of Workstations (NOW) Project,” Proc. COMP Spring, pp. 322-326, 1995.
[6] C. Thacker, L. Stewart, and E. Satterthwaite, “Firefly: A Multiprocessor Workstation,” IEEE Trans. Computers, vol. 37, no. 8, pp. 909-920, Aug. 1988.
[7] R.H. Katz et al., "Implementing a Cache Consistency Protocol," Proc. 12th Ann. Int'l Symp. Computer Architecture, June 1985, pp. 158-166.
[8] M. Banâtre, A. Gefflaut, P. Joubert, C. Morin, and P.A. Lee, “An Architecture for Tolerating Processor Failures in Shared-Memory Multiprocessors,” IEEE Trans. Computers, vol. 45, no. 10, Oct. 1996.
[9] M.D. Cin, A. Grygier, H. Hessenauer, U. Hildebrand, J. Höonig, W. Hohl, E. Michel, and A. Pataricza, “Fault Tolerance in Distributed Shared Memory Multiprocessors,” Parallel Computer Architectures, pp. 31-48, Springer-Verlag, 1994.
[10] P.A. Bernstein,"Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing," Computer, pp. 37-45, Feb. 1988.
[11] K.-L. Wu and W.K. Fuchs, "Recoverable Distributed Shared Virtual Memory," IEEE Trans. Computers, vol. 39, no. 4, pp. 460-469, Apr. 1990.
[12] B.D. Fleisch, “Reliable Distributed Shared Memory,” Proc. Second Workshop Experimental Distributed Systems, pp. 102-105, 1990.
[13] M. Stumm and S. Zhou, "Fault Tolerant Distributed Shared Memory Algorithms," Proc. Second IEEE Symp. Parallel and Distributed Processing, pp. 719-724. Dec. 1990.
[14] T.J. Wilkinson, “Implementing Fault Tolerance in a 64-bit Distributed Operating System,” PhD thesis, City Univ., London, July 1993.
[15] E.N. Elnozahy, D.B. Johnson, and W. Zwaenepoel, "The Performance of Consistent Checkpointing," Proc. 11th Symp. Reliable Distributed Systems, pp. 86-95, Oct. 1992.
[16] K.L. Wu, W.K. Fuchs, and J.H. Patel, "Error Recovery in Shared Memory Multiprocessors Using Private Caches," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 2, pp. 231-240, Apr. 1990.
[17] D. Chaiken et al., “Directory-Based Cache Coherence in Large Scale Multiprocessors,” Computer, vol. 23, no. 6, pp. 49-58, June 1990.
[18] P. Stenstrom, T. Joe, and A. Gupta, "Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures," Proc. 19th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1992, pp. 80-91.
[19] A. Saulsbury et al., "An Argument for Simple COMA," Proc. 1st Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1995, pp. 276-285.
[20] K. Li and P. Hudak, "Memory Coherence in Shared Virtual Memory Systems," ACM Trans. Computer Surveys, vol. 7, no. 4, Nov. 1989.
[21] Z. Lahjomri and T. Priol, “Koan: A Shared Virtual Memory for the ipsc/2 Hypercube,” Proc. Compar/VAAP92, Sept. 1992.
[22] C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel, “TreadMarks: Shared Memory Computing on Networks of Workstations,” Computer, vol. 29, no. 2, Feb. 1996.
[23] A.-M. Kermarrec, “Une approche globale fondée sur la réplication pour la disponibilitéet l'efficacitédes systèmes extensiblesàmémoire partagée,” PhD thesis, Universitéde Rennes 1, 1996.
[24] M. Rozier, V. Abrossimov, F. Armand, I. Boule, M. Gien, M. Guillemont, F. Hermann, P. Leonard, S. Langlois, and W. Neuhauser, “Chorus Distributed Operating Systems,” Computing Systems, vol. 1, no. 4, pp. 305-370, Oct. 1988.
[25] A. Gefflaut, C. Morin, and M. Banâtre, “Tolerating Node Failures in Cache Only Memory Architectures,” Proc. Supercomputing '94, Nov. 1994.
[26] A.-M. Kermarrec, “Contrôle de la réplication des données dans une mémoire virtuelle partagée recouvrable efficace,” Technique et Science Informatiques, vol. 15, no. 5, May 1996.
[27] C. Morin, A. Gefflaut, M. Banâtre, and A.-M. Kermarrec, “COMA: An Opportunity for Building Fault-Tolerant Scalable Shared Memory Multiprocessors,” Proc. 23rd Ann. Int'l Symp. Computer Architecture, May 1996.
[28] A. Gefflaut and P. Joubert, “Spam: A Multiprocessor Execution Driven Simulation Kernel,” Int'l J. Computer Simulation, vol. 6, no. 1, pp. 69-88, Jan. 1996.
[29] H. Davis, S.R. Goldsmidt, and J. Hennessy, “Multiprocessor Simulation Using Tango,” Proc. 1991 Int'l Conf. Parallel Processing, Aug. 1991.
[30] J. Larus, “Abstract Execution: A Technique for Efficiently Tracing Programs,” Software Practice and Experience, vol. 20, no. 12, pp. 1,251-1,258, Dec. 1990.
[31] H. Schwetman, “Csim User's Guide,” Rev. 2 ACT-126-90, MCC, 1992.
[32] J.P. Singh, W.-D. Weber,, and A. Gupta, “Splash: Stanford Parallel Applications for Shared Memory,” Technical Report CSL-TR-91-469, Stanford Univ., Apr. 1991.
[33] S.C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. 22nd Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., June 1995, pp. 24-36.
[34] A. Gefflaut, “Proposition etévaluation d'une architecture multiprocesseur extensibleàmémoire partagée tolérante aux fautes,” PhD thesis, Rennes Univ., Jan. 1995.
[35] M. Heinrich et al. “The Stanford FLASH Multiprocessor,” Proc. 21th Int'l Symp. Computer Architecture, pp. 302-313, April 1994.
[36] A. Kermarrec, G. Cabillic, A. Gefflaut, C. Morin, and I. Puaut, “A Recoverable Distributed Shared Memory Integrating Coherency and Recoverability,” Proc. 25th Int'l Symp. Fault-Tolerant Computing Systems (FTCS-25), June 1995.
[37] A.-M. Kermarrec, C. Morin, and M. Banâtre, “Design, Implementation and Evaluation of Icare: An Efficient Recoverable DSM,” Software Practice and Experience, vol. 28, no. 9, pp. 981-1,001, July 1998.
[38] C. Morin and I. Puaut, "A Survey of Recoverable Distributed Shared Virtual Memory Systems," IEEE Trans. Parallel and Distributed Systems, Vol. 8, No. 9, Sept. 1997, pp. 959-969.
[39] N. Neves, M. Castro, and P. Guedes, "A Checkpoint Protocol for an Entry Consistent Shared Memory System," Proc. 13th ACM Symp. Principles of Distributed Computing, Aug. 1994.
[40] M. Costa, P. Guedes, M. Sequeira, N. Neves, and M. Castro, "Lightweight Logging for Lazy Release Consistent Distributed Shared Memory," Proc. Second Symp. Operating Systems Design and Implementation, Oct. 1996.
[41] A.-M. Kermarrec and C. Morin, An Efficient Recoverable DSM on a Network of Workstations: Design and Implementation, chapter 7, pp. 123-153, Kluwer Academic, 1997.
[42] W. Bolosky, R. Fitzgerald, and M. Scott, “Simple But Effective Techniques for NUMA Memory Management,” Proc. 12th ACM Symp. Operating Systems Principles, Dec. 1989.
[43] A. Cox and R. Fowler, “The Implementation of a Coherent Memory Abstraction on a NUMA Multiprocessor: Experiences with Platinum,” Proc. 12th ACM Symp. Operating Systems Principles, Dec. 1989.

