|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Gunjan Khanna, Mike Yu Cheng, Padma Varadharajan, Saurabh Bagchi, Miguel P. Correia, Paulo J. Veríssimo, "Automated Rule-Based Diagnosis through a Distributed Monitor System," IEEE Transactions on Dependable and Secure Computing, vol. 4, no. 4, pp. 266-279, October-December, 2007. | |||
| BibTex | x | ||
| @article{ 10.1109/TDSC.2007.70211, author = {Gunjan Khanna and Mike Yu Cheng and Padma Varadharajan and Saurabh Bagchi and Miguel P. Correia and Paulo J. Veríssimo}, title = {Automated Rule-Based Diagnosis through a Distributed Monitor System}, journal ={IEEE Transactions on Dependable and Secure Computing}, volume = {4}, number = {4}, issn = {1545-5971}, year = {2007}, pages = {266-279}, doi = {http://doi.ieeecomputersociety.org/10.1109/TDSC.2007.70211}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Dependable and Secure Computing TI - Automated Rule-Based Diagnosis through a Distributed Monitor System IS - 4 SN - 1545-5971 SP266 EP279 EPD - 266-279 A1 - Gunjan Khanna, A1 - Mike Yu Cheng, A1 - Padma Varadharajan, A1 - Saurabh Bagchi, A1 - Miguel P. Correia, A1 - Paulo J. Veríssimo, PY - 2007 KW - Distributed system diagnosis KW - runtime monitoring KW - hierarchical Monitor system KW - fault injection based evaluation VL - 4 JA - IEEE Transactions on Dependable and Secure Computing ER - | |||
[1] META Group, “Quantifying Performance Loss: IT Performance Eng. and Measurement Strategies,” http://www.metagroup. com/cgi-bin/inetcgi/ jspdisplayArticle.do?oid=18750, 2000.
[2] Costs of Computer Downtime to American Businesses, FIND/SVP, 1993.
[3] G. Khanna, J. Rogers, and S. Bagchi, “Failure Handling in a Reliable Multicast Protocol for Improving Buffer Utilization and Accommodating Heterogeneous Receivers,” Proc. 10th IEEE Pacific Rim Dependable Computing Conf. (PRDC '04), pp. 15-24, 2004.
[4] G. Khanna, P. Varadharajan, and S. Bagchi, “Self-Checking Network Protocols: A Monitor-Based Approach,” Proc. 23rd IEEE Symp. Reliable Distributed Systems (SRDS '04), pp. 18-30, 2004.
[5] M. Diaz, G. Juanole, and J.-P. Courtiat, “Observer—A Concept for Formal On-Line Validation of Distributed Systems,” IEEE Trans. Software Eng., vol. 20, no. 12, pp. 900-913, Dec. 1994.
[6] M. Zulkernine and R.E. Seviora, “A Compositional Approach to Monitoring Distributed Systems,” IEEE Dependable Systems and Networks, pp. 763-772, 2002.
[7] S. Bagchi, Y. Liu, Z. Kalbarczyk, R.K. Iyer, Y. Levendel, and L. Votta, “A Framework for Database Audit and Control Flow Checking for a Wireless Telephone Network Controller,” Proc. Int'l Conf. Dependable Systems and Networks (DSN '01), pp. 225-234, 2001.
[8] R. Buskens and R. Bianchini Jr., “Distributed On-Line Diagnosis in the Presence of Arbitrary Faults,” Proc. 23rd Int'l Symp. Fault-Tolerant Computing (FTCS '93), 1993.
[9] D.M. Chiu, M. Kadansky, J. Provino, J. Wesley, H. Bischof, and H. Zhu, “A Congestion Control Algorithm for Tree-Based Reliable Multicast Protocols,” Proc. IEEE INFOCOM '02, pp. 1209-1217, 2002.
[10] T. Chandra and S. Toueg, “Unreliable Failure Detectors for Reliable Distributed Systems,” J. ACM, vol. 43, no. 2, pp. 225-267, 1996.
[11] G. Bracha and S. Toueg, “Asynchronous Consensus and Broadcast Protocols,” J. ACM, vol. 32, no. 4, pp. 824-840, 1985.
[12] M. Correia, N.F. Neves, and P. Veríssimo, “How to Tolerate Half Less One Byzantine Nodes in Practical Distributed Systems,” Proc. 23rd Int'l Symp. Reliable and Distributed Systems (SRDS '04), pp. 174-183, 2004.
[13] M. Correia, N.F. Neves, and P. Veríssimo, “The Design of a COTS Real-Time Distributed Security Kernel,” Proc. Fourth European Dependable Computing Conf. (EDCC '02), pp. 234-252, 2002.
[14] M. Correia, N.F. Neves, L.C. Lung, and P. Veríssimo, “Low Complexity Byzantine-Resilient Consensus,” Distributed Computing, vol. 17, no. 3, pp. 237-249, 2005.
[15] A. Mostefaoui, M. Raynal, and C. Travers, “Crash-Resilient Time-Free Eventual Leadership,” Proc. 23rd IEEE Int'l Symp. Reliable Distributed Systems (SRDS '04), pp. 208-217, 2004.
[16] I. Katzela and M. Schwartz, “Schemes for Fault Identification in Communication Networks,” IEEE/ACM Trans. Networking, vol. 3, no. 6, pp. 753-764, 1995.
[17] F.P. Preparata, G. Metze, and R.T. Chien, “On the Connection Assignment Problem of Diagnosable Systems,” IEEE Trans. Electronic Computers, vol. 16, no. 6, pp. 848-854, 1967.
[18] S. Maheshwari and S. Hakimi, “On Models for Diagnosable Systems and Probabilistic Fault Diagnosis,” IEEE Trans. Computers, vol. 25, pp. 228-236, 1976.
[19] D. Fussel and S. Rangarajan, “Probabilistic Diagnosis of Multiprocessor Systems with Arbitrary Connectivity,” Proc. 19th Int'l IEEE Symp. Fault-Tolerant Computing (FTCS '89), pp. 560-565, 1989.
[20] M. Barborak, A. Dahbura, and M. Malek, “The Consensus Problem in Fault-Tolerant Computing,” ACM Computing Surveys, vol. 25, no. 2, pp. 171-220, June 1993.
[21] A. Bagchi and S. Hakimi, “An Optimal Algorithm for Distributed System Level Diagnosis,” Proc. 21st Int'l Symp. Fault Tolerant Computing (FTCS '91), pp. 214-221, 1991.
[22] R. Chillarege and R.K. Iyer, “Measurement-Based Analysis of Error Latency,” IEEE Trans. Computers, vol. 36, no. 5, May 1987.
[23] S. Lee and K.G. Shin, “On Probabilistic Diagnosis of Multiprocessor Systems Using Multiple Syndromes,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 6, pp. 630-638, June 1994.
[24] A. Avizienis and J.-C. Laprie, “Dependable Computing: From Concepts to Design Diversity,” Proc. IEEE, vol. 74, no. 5, pp. 629-638, 1986.
[25] S. Chandra and P.M. Chen, “How Fail-Stop Are Faulty Programs?” Proc. 28th Ann. Int'l Symp. Fault-Tolerant Computing (FTCS '98), pp. 240-249, 1998.
[26] H. Madeira and J.G. Silva, “Experimental Evaluation of the Fail-Silent Behavior in Computers without Error Masking,” Proc. 24th Int'l Symp. Fault-Tolerant Computing (FTCS '94), pp. 350-359, 1994.
[27] A. Brown, G. Kar, and A. Keller, “An Active Approach to Characterizing Dynamic Dependencies for Problem Determination in a Distributed Environment,” Proc. Int'l Symp. Integrated Network Management (IM '01), 2001.
[28] S. Bagchi, G. Kar, and J.L. Hellerstein, “Dependency Analysis in Distributed Systems Using Fault Injection: Application to Problem Determination in an e-Commerce Environment,” Proc. 12th Int'l Workshop Distributed Systems: Operations and Management (DSOM '01), 2001.
[29] M.K. Aguilera, J.C. Mogul, J.L. Wiener, P. Reynolds, and A. Muthitacharoen, “Performance Debugging for Distributed Systems of Black Boxes,” Proc. 19th ACM Symp. Operating Systems Principles (SOSP '03), 2003.
[30] E.P. Duarte and T. Nanya, “A Hierarchical Adaptive Distributed System-Level Diagnosis Algorithm,” IEEE Trans. Computers, vol. 47, no. 1, pp. 34-45, Jan. 1998.
[31] J.L. Hellerstein, “A General-Purpose Algorithm for Quantitative Diagnosis of Performance Problems,” J. Network and Systems Management, 2003.
[32] P. Barham, R. Isaacs, R. Mortier, and D. Narayanan, “Magpie: On-Line Modelling and Performance-Aware Systems,” Proc. ACM Ninth Workshop Hot Topics in Operating Systems (HotOS '03), pp.85-90, 2003.
[33] R. Alur, R.K. Brayton, T.A. Henzinger, S. Qadeer, and S.K. Rajamani, “Partial-Order Reduction in Symbolic State-Space Exploration,” Proc. Ninth Int'l Conf. Computer-Aided Verification (CAV '97), pp. 340-351, 1997.
[34] K. Ravi and F. Somenzi, “High–Density Reachability Analysis,” Proc. IEEE/ACM Int'l Conf. Computer-Aided Design (ICCAD '95), pp.154-158, 1995.
[35] J.R. Burch, E.M. Clarke, and D.E. Long, “Symbolic Model Checking with Partitioned Transition Relations,” Proc. Design Automation Conf. (DAC '91), pp. 403-407, 1991.
[36] K.L. McMillan, Symbolic Model Checking: An Approach to the State-Explosion Problem. Kluwer Academic Publishers, 1993.
[37] L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System,” Comm. ACM, vol. 21, no. 7, pp. 558-565, July 1978.
[38] M. Castro and B. Liskov, “Proactive Recovery in a Byzantine-Fault-Tolerant System,” Proc. Fourth Symp. Operating Systems Design and Implementation (OSDI '00), Oct. 2000.
[39] S. Lee and K. Shin, “Optimal and Efficient Probabilistic Distributed Diagnosis Schemes,” IEEE Trans. Computers, vol. 42, no. 7, pp. 882-886, July 1993.
[40] S.T. King and P.M. Chen, “Backtracking Intrusions,” Proc. Symp. Operating Systems Principles (SOSP), Oct. 2003.
[41] R.V. Renesse, K.P. Birman, and W. Vogels, “Astrolabe: A Robust and Scalable Technology for Distributed System Monitoring, Management, and Data Mining,” ACM Trans. Computer Systems, vol. 21, no. 2, pp. 164-206, 2003.
[42] J. Offutt and A. Abdurazik, “Generating Tests from UML Specifications,” Proc. Second Int'l Conf. Unified Modeling Language—Beyond the Standard (UML '99), pp. 416-429, 1999.
[43] C. Meudec, “Automatic Generation of Software Tests from Formal Specifications,” PhD dissertation, The Queen's Univ. of Belfast, 1997.
[44] E.N. Elnozahy, L. Alvisi, Y.M. Wang, and D.B. Johnson, “A Survey of Rollback-Recovery Protocols in Message-Passing Systems,” ACM Computing Surveys, vol. 34, no. 3, pp. 375-408, Sept. 2002.
[45] R. Schwarz and F. Mattern, “Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail,” Distributed Computing, vol. 7, no. 3, pp. 149-174, 1994.
[46] O. Babaoglu and K. Marzullo, “Detecting Global States of Distributed System: Fundamental Concepts and Mechanisms,” Distributed Systems, Addison-Wesley, pp. 55-96, 1993.

