This Article 
 Bibliographic References 
 Add to: 
Towards Self-Stabilizing Operating Systems
July/August 2008 (vol. 34 no. 4)
pp. 564-576
Shlomi Dolev, Ben-Gurion University of the Negev, Beer-Sheva
Reuven Yagel, Ben-Gurion University of the Negev, Beer-Sheva
This work presents several approaches for designing self-stabilizing operating systems. The first approach is based on periodical automatic reinstalling of the operating system and restart. The second, reinstalls the executable portion of the operating system and uses predicates on the operating system state (content of variables) to ensure that the operating system does not diverge from its specifications. The last approach presents an example of a tailored self-stabilizing very-tiny operating system. Prototypes using the Intel Pentium processor were composed.

[1] V. Abrossimov et al., “Fast Error Recovery in chorus/os. The Hot-Restart Technology,” technical report, Chorus Systems, Aug. 1996.
[2] F. Armand, “ChorusOS Features and Architecture Overview,” technical report, Sun, Dec. 2001.
[3] O. Brukman, S. Dolev, and H. Kolodner, “Self-Stabilizing Autonomic Recoverer for Eventual Byzantine Software,” Proc. IEEE Int'l Conf. Software-Science Technology and Eng. (SwSTE), 2003.
[4] Bochs IA-32 Emulator Project, http:/, 2008.
[5] M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance and Proactive Recovery,” ACM Trans. Computer Systems, vol. 20, no. 4, pp. 398-461, 2002.
[6] B. Demsky and M. Rinard, Automatic Detection and Repair of Errors in Data Structures, , 2008.
[7] S. Dolev, Self-Stabilization. MIT Press, 2000.
[8] S. Dolev and T. Herman, “Parallel Composition of Stabilizing Algorithms,” Proc. Fourth Workshop Self-Stabilizing Systems (WSS '99), pp. 25-33, 1999.
[9] E.W. Dijkstra, “Self-Stabilizing Systems in Spite of Distributed Control,” Comm. ACM, vol. 17, no. 11, pp. 643-644, 1974.
[10] S. Dolev and R. Kat, “Self-Stabilizing Distributed File Systems,” Proc. Int'l Workshop Self-Repairing and Self-Configurable Distributed Systems (RCDS '02), pp. 384-389, 2002.
[11] S. Dolev and Y. Haviv, “Self-Stabilizing Microprocessor: Analyzing and Overcoming Soft Errors,” IEEE Trans. Computers, vol. 55, no. 4, Apr. 2006.
[12] S. Dolev, S. Moran, and A. Israeli, “Self Stabilization of Dynamic Systems Assuming Only Read/Write Atomicity,” Proc. Ninth Ann. ACM Symp. Principles of Distributed Computation (PODC '90), pp.103-117, 1990.
[13] W.R. Dunn, “Designing Safety-Critical Computer Systems,” Computer, pp. 40-56, Nov. 2003.
[14] S. Dolev and J.L. Welch, “Self-Stabilizing Clock Synchronization in the Presence of Byzantine Faults,” Proc. Second Workshop Self-Stabilizing Systems (WSS '95), pp.9.1-9.12, 1995.
[15] S. Dolev and R. Yagel, “Memory Management for Self-Stabilizing Operating Systems,” Proc. Seventh Symp. Self Stabilizing Systems (SSS '05), Oct. 2005.
[16] S. Dolev and R. Yagel, “Self-Stabilizing Device Drivers,” Proc. Eighth Int'l Symp. Stabilization, Safety, and Security of Distributed Systems (SSS '06), pp. 276-289, Nov. 2006.
[17] D.R. Engler and M.F. Kaashoek, “Exterminate All Operating System Abstractions,” Proc. Fifth Workshop Hot Topics in Operating Systems (HotOS '95), May 1995.
[18] D.R. Engler, M.F. Kaashoek, and J.W. OwToole Jr., “The Operating System Kernel as a Secure Programmable Machine,” Proc. Sixth ACM SIGOPS European Workshop: Matching Operating Systems to Application Needs, 1994.
[19] A. Fox and D. Patterson, “Self-Repairing Computers,” Scientific Am., June 2003.
[20] Y. Hong, D. Chen, L. Li, and K.S. Trivedi, “Closed Loop Design for Software Rejuvenation,” Proc. Workshop Self-Healing, Adaptive, and Self-Managed Systems (SH AMAN), 2002.
[21] IBM, Autonomic Computing, autonomic, 2001.
[22] Intel Corp., The IA-32 Intel Architecture Software Developer's Manual, documentation. htm, 2006.
[23] G. Kaiser, J. Parekh, P. Gross, and G. Valetto, “Kinesthetics eXtreme: An External Infrastructure for Monitoring Distributed Legacy Systems,” Proc. Fifth Ann. Int'l Active Middleware Workshop (AMS '03), pp. 22-30, June 2003.
[24] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister, “System Architecture Directions for Networked Sensors,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000.
[25] Jaluna-2/RT Technical Features, proservjaluna2-rt.features.html , 2008.
[26] J.O. Kephart and D.M. Chess, “The Vision of Autonomic Computing,” Computer, pp. 41-50, Jan. 2003.
[27] C.R. Landau, “The Checkpoint Mechanism in KeyKOS,” Proc. Second Int'l Workshop Object Orientation in Operating Systems (IWOOOS '92), pp. 86-91, , Sept. 1992.
[28] P. Levis and D. Culler, “Mate: A Tiny Virtual Machine for Sensor Networks,” Proc. 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '02), Oct. 2002.
[29] Linux Online, Checkpoint.htmlhttp:/, 2008.
[30] L. Lamport, R. Shostak, and M. Pease, “The Byzantine Generals Problem,” ACM Trans. Programming Languages and Systems, vol. 4, no. 3, pp. 382-401, 1982.
[31] M. Mactaggart, “Thinking Outside the Box,” Application Development Advisor, ADA Comm., June 2002.
[32] AIAA, Proc. Military and Aerospace Programmable Logic Device Confs. (MAPLD), http:/, 1998-2003.
[33] B. Meyer, “Applying ‘Design by Contract’,” Computer, pp.40-51, Oct. 1992.
[34] Microsoft, Windows XP/Office XP Feature Overview, overviewswindowsxpofficexp.asp , 2008.
[35] S.J. Mullender, G. van Rossum, A.S. Tanenbaum, R. van Renesse, and H. van Staveren, “Amoeba: A Distributed Operating System for the 1990s,” Computer, vol. 23, no. 5, pp. 44-53, May 1990.
[36] H. Munz, “LP-VxWin VxWorks Together with Windows on the Same PC,” Real-Time Magazine 97Q2 on RTOS Update (Part1), 1997q2_p047.pdf, 1997.
[37] S.S. Mukherjee, C. Weaver, J. Emer, S.K. Reinhardt, and T. Austin, “A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor,” Proc. 36th Ann. Int'l Symp. Microarchitecture (MICRO), 2003.
[38] The Netwide Assembler, http:/, 2008.
[39], 2008.
[40] D. Patterson, A. Brown, P. Broadwell, G. Candea, M. Chen, J. Cutler, P. Enriquez, A. Fox, E. Kiciman, M. Merzbacher, D. Oppenheimer, N. Sastry, W. Tetzlaff, J. Traupman, and N. Treuhaft, “Recovery Oriented Computing (ROC): Motivation, Definition, Techniques and Case Studies,” Technical Report UCB/CSD-02-1175, UC Berkeley Computer Science, Mar. 2002.
[41] QNX Software Systems. Realtime Operating System Software, http:/, 2008.
[42] M. Swift, B.N. Bershad, and H.M. Levy, “Improving the Reliability of Commodity Operating Systems,” Proc. 19th ACM Symp. Operating Systems Principles (SOSP '03), Oct. 2003, see also M. Swift, “Improving the Reliability of Commodity Operating Systems,” PhD dissertation, Univ. of Washington, 2005.
[43] A. Silberschatz, P.B. Galvin, and G. Gagne, Operating System Concepts, sixth ed. John Wiley & Sons, 2003.
[44], 2008.
[45] http:/, 2008.
[46] A. Shieh, D. Williams, E. Gün Sirer, and F.B. Schneider, “Nexus: A New Operating System for Trustworthy Computing,” Proc. Symp. Operating Systems Principles WIP Session (SOSP '05), Oct. 2005.
[47] A.S. Tanenbaum, Modern Operating Systems, second ed. Prentice Hall, 2001.
[48] A.S. Tanenbaum and A.S. Woddhull, Operating Systems Design and Implementation, third ed. Prentice Hall, 2006.
[49] R. Wahbe, S. Lucco, T.E. Anderson, and S.L. Graham, “Efficient Software-Based Fault Isolation,” Proc. Symp. Operating System Principles (SOSP), 1993.

Index Terms:
DDistributed objects, Fault tolerance
Shlomi Dolev, Reuven Yagel, "Towards Self-Stabilizing Operating Systems," IEEE Transactions on Software Engineering, vol. 34, no. 4, pp. 564-576, July-Aug. 2008, doi:10.1109/TSE.2008.46
Usage of this product signifies your acceptance of the Terms of Use.