|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Claudio Basile, Zbigniew Kalbarczyk, Ravishankar K. Iyer, "Active Replication of Multithreaded Applications," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 5, pp. 448-465, May, 2006. | |||
| BibTex | x | ||
| @article{ 10.1109/TPDS.2006.56, author = {Claudio Basile and Zbigniew Kalbarczyk and Ravishankar K. Iyer}, title = {Active Replication of Multithreaded Applications}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {17}, number = {5}, issn = {1045-9219}, year = {2006}, pages = {448-465}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2006.56}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Active Replication of Multithreaded Applications IS - 5 SN - 1045-9219 SP448 EP465 EPD - 448-465 A1 - Claudio Basile, A1 - Zbigniew Kalbarczyk, A1 - Ravishankar K. Iyer, PY - 2006 KW - Fault tolerance KW - replication KW - multithreading KW - nondeterminism KW - fault injection. VL - 17 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
Abstract—Software-based active replication is expensive in terms of performance overhead. Multithreading can help improve performance; however, thread scheduling is a source of nondeterminism in replica behavior. To achieve strong replica consistency in multithreaded environments, this paper proposes intercepting mutex lock/unlock operations performed by threads on accessing the shared data and contributes with two algorithmic solutions: 1) a
[1] M. Cukier et al., “AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects,” Proc. IEEE Symp. Reliable Distributed Systems, pp. 245-253, 1998.
[2] A. Borg et al., “Fault Tolerance under UNIX,” ACM Trans. Computer Systems, vol. 7, no. 1, pp. 1-24, 1989.
[3] T.C. Bressoud and F.B. Schneider, “Hypervisor-Based Fault Tolerance,” ACM Trans. Computer Systems, vol. 14, no. 1, pp. 80-107, 1996.
[4] P.A. Barrett et al., “The Delta-4 Extra Performance Architecture (XPA),” Proc. Int'l Symp. Fault-Tolerant Computing, pp. 481-488, 1990.
[5] L.E. Moser, P.M. Melliar-Smith, and P. Narasimhan, “Consistent Object Replication in the Eternal System,” Theory and Practice of Object Systems, vol. 4, no. 2, pp. 81-92, 1998.
[6] R. Jimenez-Peris, M. Patino-Martinez, and S. Arevalo, “Deterministic Scheduling for Transactional Multithreaded Replicas,” Proc. IEEE Symp. Reliable Distributed Systems, pp. 164-173, 2000.
[7] C. Basile, K. Whisnant, Z. Kalbarczyk, and R. Iyer, “Loose Synchronization of Multithreaded Replicas,” Proc. IEEE Symp. Reliable Distributed Systems, pp. 250-255, 2002.
[8] C. Basile, Z. Kalbarczyk, and R. Iyer, “Preemptive Deterministic Scheduling Algorithm for Multithreaded Replicas,” Proc. Int'l Conf. Dependable Systems and Networks, pp. 149-158, 2003.
[9] M. Hayden, “The Ensemble System,” PhD dissertation, Dept. of Computer Science, Cornell Univ. 1997.
[10] M.K. Reiter, “The Rampart Toolkit for Building High-Integrity Services,” Lecture Notes in Computer Science, vol. 938, pp. 99-110, 1994.
[11] G. Chockler, I. Keidar, and R. Vitenberg, “Group Communication Specifications: A Comprehensive Study,” ACM Computing Surveys, vol. 33, no. 4, pp. 427-469, 2001.
[12] R. Jimenez-Peris, M. Patino-Martinez, and G. Alonso, “Non-Intrusive, Parallel Recovery of Replicated Data,” Proc. IEEE Symp. Reliable Distributed Systems, pp. 150-159, 2002.
[13] C. Basile, Z. Kalbarczyk, and R. Iyer, “Active Replication of Multithreaded Replicas: Appendix,” http://www.computer.org/tpdsarchives.htm , 2005.
[14] L. Lamport, “Time, Clocks and the Ordering of Events in Distributed Systems,” Comm. ACM, vol. 21, no. 7, pp. 558-564, 1978.
[15] F.B. Schneider, “Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial,” ACM Computing Surveys, vol. 22, no. 4, pp. 299-319, 1990.
[16] C. Basile, Z. Kalbarczyk, K. Whisnant, and R. Iyer, “Active Replication of Multithreaded Applications,” Technical Report CRHC-02-01, Univ. of Illinois at Urbana-Champaign, 2002.
[17] D. Stott, B. Floering, Z. Kalbarczyk, and R. Iyer, “Dependability Assessment in Distributed Systems with Lightweight Fault Injectors in NFTAPE,” Proc. Int'l Computer Performance and Dependability Symp., 2000.
[18] E. Fuchs, “Validating the Fail-Silence Assumption of the MARS Architecture,” Proc. Dependable Computing for Critical Applications Conf., pp. 225-247, 1998.
[19] H. Madeira and J.G. Silva, “Experimental Evaluation of the Fail-Silent Behavior in Computers without Error Masking,” Proc. Int'l Symp. Fault-Tolerant Computing, pp. 350-359, 1994.
[20] M. Rimen, J. Ohlsson, and J. Torin, “On Microprocessor Error Behavior Modeling,” Proc. Int'l Symp. Fault-Tolerant Computing, pp. 76-85, 1994.
[21] C. Basile, L. Wang, Z. Kalbarczyk, and R. Iyer, “Group Communication Protocols under Errors,” Proc. IEEE Symp. Reliable Distributed Systems, pp. 35-44, 2003.
[22] S. Pleisch and A. Schiper, “FATOMAS— A Fault-Tolerant Mobile Agent System Based on the Agent-Dependent Approach,” Proc. Int'l Conf. Dependable Systems and Networks, pp. 215-224, 2001.
[23] G.D. Parrington, S.K. Shrivastava, S.M. Wheater, and M.C. Little, “The Design and Implementation of Arjuna,” Computing Systems, vol. 8, no. 2, pp. 255-308, 1995.
[24] R. Guerraoui, P. Felber, B. Garbinato, and K.R. Mazouni, “System Support for Object Groups,” Proc. ACM Conf. Object-Oriented Programming Systems, Languages, and Applications, pp. 244-258, 1998.

